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In 2009, Oracle replaced the long-serving sorting algorithm in its Java 7 runtime 
library by a new dual pivot Quicksort variant due to Yaroslavskiy The decision was 
based on the strikingly good performance of Yaroslavskiy' s implementation in running 
time experiments. At that time, no precise investigations of the algorithm were available 

£S) to explain its superior performance — on the contrary: Previous theoretical studies of 

other dual pivot Quicksort variants even discouraged the use two pivots. Only in 2012, 
two of the authors gave an average case analysis of a simplified version of Yaroslavskiy's 

-^ algorithm, proving that savings in the number of comparisons are possible. However, 

^_ Yaroslavskiy's algorithm needs more swaps, which renders the analysis inconclusive. 

To force the issue, we herein extend our analysis to the fully detailed style of Knuth: 
We determine the exact number of executed Java Bytecode instructions. Surprisingly, 
Yaroslavskiy's algorithm needs sightly more Bytecode instructions than a simple imple- 

(—1 mentation of classic Quicksort — despite contradicting running times. Like in Oracle's 

c/3 library implementation we incorporate the use of Insertionsort on small subarrays and 

show that it indeed speeds up Yaroslavskiy's Quicksort in terms of Bytecodes. Even if 

using optimal Insertionsort thresholds the new Quicksort variant again needs slightly 

t-h more Bytecode instructions on average. 

Finally, we show that the (suitably normalized) costs of Yaroslavskiy's algorithm 
converge to a random variable whose distribution is characterized by a fix point equa- 
tion. From that, we compute variances of costs and show that for large n, costs are 

q" concentrated about their mean. 
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Quicksort is a divide and conquer sorting algorithm originally proposed by Hoare [ 1961a] [1961b) . 



. £h The procedure starts by selecting an arbitrary element from the list to be sorted as pivot. Then, 

^ Quicksort partitions the elements into two groups: those smaller than the pivot and those larger 

than the pivot. After partitioning, we know the exact rank of the pivot element in the sorted list, 
so we can put it at its final landing position between the groups of smaller and larger elements. 
Afterwards, Quicksort proceeds by recursively sorting the two partitions, until it reaches sublists 
of length zero or one, which are already sorted by definition. 

We will in the following always assume random access to the data, i. e. the elements are given as 
entries of an array. Then, the partitioning process can work in place by directly manipulating the 
array. This makes Quicksort convenient to use and avoids the need for extra space — Hoare's initial 
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1961b | works in place and Sedgewick 1 1975] studies several variants 



implementation IHoare 
thereof. 

In the worst case, Quicksort has quadratic complexity, namely if in every partitioning step, the 
pivot is the smallest or largest element of the current subarray However, this behavior turns out 
rare, such that the expected complexity remains "linearithmic," i. e. of order Q(nlogn). |Hoare 1 1962] 
already gives a precise average case analysis of his algorithm, which is nowadays contained in 
most algorithms textbooks, [e. g. |Cormen et aT] 2009|. Sedgewick [1975; 1977] refines this analysis 
to count the exact number of executed primitive instructions of a low level implementation. This 
detailed breakdown reveals that Quicksort has the asymptotically fastest average running time on 
MIX among all the sorting algorithm studied by Knuth [1998]. 

Only considering average results can be misleading: For having confidence in a sorting method, 
we also require that it is likely to observe costs close to the expectation. The standard deviation of 
the Quicksort complexity grows linearly with n | |Hennequin| |1989"| |Knuth[ 19981, which implies 
that the costs are concentrated around their mean for large n. Precise tail bounds which ensure 
tight concentration around the mean where derived by McDiarmid and Hayward [19961; see also 
IFill and Jansonl 120021 . 

Much more information is available on the full distribution of the number of key comparisons. 



When suitably normalized, the number of comparisons converges in law [Regnier, 1989], with a 



certain unknown limit distribution. |Hennequin| |1989 1 computed its first cumulants and proved 
that it is not a normal distribution. The limiting distribution can be implicitly characterized by 
a stochastic fix point equation [Rosier 1991| and it is known to have a smooth density [Fil l and| 
Janson] [2000} |Tan and Hadjicostas[|1995) . 

Due to its efficiency in the average, Quicksort has been used as general purpose sorting method 
for decades, for example in the C/C++ standard library and the Java runtime library. As sorting 
is a widely used elementary task, even small speedups of such library implementations can be 
worthwhile. This caused a run on variations and modifications to the basic algorithm. One very 
successful optimization is based on the observation that Quicksort's performance on tiny subarrays 
is comparatively poor. Therefore, we should switch to some special purpose sorting method for 
these cases [Hoare, 1962]. Singleton [ 1969] proposed using Insertionsort for this task, which indeed 
"for small n [ . . . ] is about the best sorting method known" according to Sedgewick [1975 1 p. 22]. He 
also gives a precise analysis of Quicksort where Insertionsort is used for sublists of size less than 
M [Sedgewick} |1977) . For his MIX implementation, the optimal choice is M - 9, which leads to a 
speedup of 14 % for n = 10000. 

Another very successful optimization is to improve the choice of the pivot element by selecting 
the median of a small sample of the current subarray. This idea has been studied extensively 
jChern an d Hwang, 2001 J |Durand| 12003} |Emden| [19701 |Hennequin] [1989| |Hoare] [1962} [Martinezj 
and Roura, 2001; Sedgewick, 1977; Singleton, 1969], and real world implementations make heavy 
use of it [Bentley and McIlroy nT993| ~ 

Precise analysis of the impact of a modification often helped in understanding and assessing 
its usefulness, and in fact, many proposed variations turned out detrimental in the end (many 
examples are exposed by Sedgewick [ 1975 1). Partitioning with more than one pivot used to be 
counted among those. Sedgewick [1975, p. 150ff] studies a dual pivot Quicksort variant in detail, 
but finds that it uses more swaps and comparisons than classic Quicksort [j] Later |Hennequin| 
[1991| considers the general case of partitioning into s > 2 partitions. For s - 3, his Quicksort uses 
asymptotically the same number of comparisons as classic Quicksort; for s > 3, he attests minor 
savings which, however, will not compensate for the much more complicated partitioning process 
in practice. These negative results may have discouraged further research along these lines in the 
following two decades. 



Interestingly, tiny adaptions make Sedgewick's dual pivot Quicksort competitive w. r. t. the number of comparisons; in fact 
it even needs only 28/15nhin + 0(n) comparisons |Wild 2012, Chapter 5], which is less than Yaroslavskiy's algorithm! 
Yet, the many swaps dominate overall performance. 



1. Introduction 

In 2009, however, Vladimir Yaroslavskiy presented his new dual pivot Quicksort variant at 
the Java core library mailing listjj After promising running time benchmarks, Oracle decided 
to use Yaroslavskiy's algorithm as default sorting method for arrays of primitive typqj in the 
Java 7 runtime library, even though literature did not offer an explanation for the algorithm's good 
performance. 

Only in 2012, Wild an d Nebel | 2012| made a first step towards closing this gap by giving exact 



expected numbers of swaps and comparisons for a simple version of Yaroslavskiy's algorithm. We 
will re-derive these results here as a special case. The surprising finding is that Yaroslavskiy's algo- 
rithm uses 1.9nlnn + 0(n) comparisons — asymptotically 5 % less than the 2nlnn + 0(n) comparisons 
needed by classic Quicksort. 

The reason for the savings is clever usage of stochastic dependencies: Yaroslavskiy's algorithm 
contains two opposite pairs of locations C& (lines 11 and 15 of Algorithm fl| and C g (lines 16 and 17) 
in the code where key comparisons are done: At C&, elements are first compared with the small 
pivot p (in line 11) and then with the large pivot q (in line 15) — if still needed, i. e. only if the 
element is larger than p. For C g it is vice versa. By the way partitioning proceeds, it happens 
that Ck is executed more often (than on average) if there are more elements smaller than p (than 
on average); similarly C g is visited often if there are many large elements. Consequently, the 
probability that one comparison suffices to determine an element's target partition is strictly larger 
than in a symmetric algorithm^] 

While the lower number of comparisons seems promising, Yaroslavskiy's dual pivot Quicksort 
needs more swaps than classic Quicksort, so the high level analysis remains inconclusive. In this 
paper, we extend our analysis to detailed instruction counts, complementing previous work on 
classic Quicksort [Sedgewick, 1977 1. However, instead of Knuth's slightly dated mythical machine 
MIX, we consider the Java Virtual Machine [Lindholm and Yellin, 19991 and count the number of 
executed Java Bytecode instructions. Wild 1 2012) gives similar results for Knuth's MMIX [Knuth 



20051, the successor of MIX. 

The number of executed Bytecode instructions has been shown to resemble actual running time 
JCamesi et al., 20061, even though just in time compilation can have a tremendous influence [Wild 



et al. 20131 and some aspects of modern processor architectures are neglected. 



Extending the results of Wild and Nebel [2012], the analysis in this paper includes Insertion 



sorting short subarrays. Moreover, all previous results on Yaroslavskiy's algorithm only concern 
expected behavior. In this article, we show existence and give characterizations of limit distri- 
butions. A comforting result of these studies is that the standard deviation grows linearly for 
Yaroslavskiy's algorithm as well, which implies concentration about the mean. 

This paper does not consider more refined ways to choose pivots, like selecting order statistics of 
a random sample. We defer a detailed treatment of Yaroslavskiy's algorithm under this optimization 
to a separate article (to appear soon). 



The rest of this paper is organized as follows. Section [LI] presents our object of study. In 
Section [2] we review basic notions used in the analysis later. We also define our input model and 
collect elementary properties of Yaroslavskiy's algorithm. In Section [3} we derive exact average 
costs in terms of comparisons, swaps and executed Bytecode instructions. These are used in 
Section|4]to identify a limiting distribution of normalized costs in all three measures, from which 
we obtain asymptotic variances. Finally, Section [5] summarizes our findings and puts them in 
context. 



see e. g. the archive on http : //permalink . gmane . org/gmane . comp . Java . open jdk .core- libs, devel/2 628 
Primitive types are all integer types as well as Boolean, character and floating point types. For arrays of objects, the 

library specification prescribe a stable sorting method, which Quicksort does not provide. Instead a variant of Mergesort 

is used, there. 

For details consider |Wild and Nebel 2012J or the recording of the corresponding talk at 

http : //www .slide share, net /sebawild/ average-case- analysis-of- Java- 7 s-dual- pivot -quicksort 
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Algorithm 1 Yaroslavskiy's Dual Pivot Quicksort with Insertionsort. 
Quicksort Yaroslavskiy (A, Uft,right) 

II Sort kUeft,..., right] (including end points). 

1 if right - left < M II i. e. the subarray has n<M elements 

2 lNSERTIONSORT(A,fe/it,rigAi) 

3 else 

4 if A[left]> Alright] 

5 p := Alright]; q := Alleft] 

6 else 

7 p := Alleft]; q := Alright] 

8 end if 

9 £ := left + 1; g := right-!; k ~ £ 
10 while & < g 

n ifA[&]<p 

12 Swap AM and AM 

13 ^ := ^ + 1 

14 else 

15 itAlk]>q 

16 while A[g] > q and £<gdog:=g-l end while 

17 if Alg] > p 

is Swap A[A] and A[#] 

19 else 

20 Swap A[&] and Alg]; Swap A[A] and AM 

21 ^:=^ + l 

22 end if 

23 g:=g-l 

24 end if 

25 end if 

26 k := k + 1 

27 end while 

28 £~£-l; g:=g + l 

29 A[Ze/?] := AM; AM := p // Swap pivots to Bnal position 

30 Alright] := A[g]| A[^] := q 

31 QUICKSORTYAROSLAVSKIY(A, left J -1) 

32 QuicksortYaroslavskiy(A, ^ + l,g - 1) 

33 QUICKSORTYAROSLAVSKIY(A,g + l,right) 

34 end if 



1.1. Yaroslavskiy's Algorithm 

Yaroslavskiy s dual pivot Quicksort is shown in Algorithm [l] The initial call to the procedure takes 
the form QuicksortYaroslavskiy(A, l,n), where A is an array containing the elements to be 
sorted and n is its length. After selecting the outermost elements as pivots p and q such that p < q, 
lines 9 — 30 of Algorithm [l] comprise the partitioning method. After that, all small elements, i. e. 
those smaller than p (and q), form a contiguous region at the left end of the array, followed by p 
and the medium elements. Finally q separates the medium and large elements. After recursively 
sorting these three regions, the whole array is in order. 

Yaroslavskiy's partitioning algorithm is an asymmetric generalization of Hoare's crossing 
pointers technique: The index pointers k and g start at the left and right ends, respectively, and 
are moved towards each other until they cross. Additionally, pointer £ marks the position of the 
rightmost small element, such that the array is kept invariably in the following form: 
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Our Algorithm [T] differs from Algorithm 3 of [Wild and Nebel, 2012] as follows: 



For lists of length less than M, we switch to InsertionSortJ^JA possible implementation is 
given in Appendix [B] The case M - 1 corresponds to not using Insertionsort at all. 

The swap of k[k] and k[g] has been moved behind the check k[g] > p. Thereby, we never 
use array positions in a key comparison after we have overwritten their contents in one 
partitioning step; see Fact[l]below 

The comparison in line 15 has been made non-strict. For distinct elements this makes no 
difference, but it drastically improves performance in case of many equal keys [Wild 2012| 



p. 54]. The reader might find it instructive to consider the behavior on an array with all 
elements equal. 

Note that partitioning an array around two pivots is similar in nature to the Dutch National 



Flag Problem (DNFP) posed by Dijkstra 1 19761 as a programming exercise: 



Given an array of n red, white and blue pebbles, rearrange them by swaps, such that 
the colors form the Dutch national Hag: red, white and blue in contiguous regions. Each 
pebble may be inspected only once and only a constant amount of extra storage may be 
used. 

Dijkstra assumes an operation "buck" that tells us an element's color in one shot, so any algorithm 
must use exactly n buck-operations. Performance differences only concern the number of swaps 
needed. 

Interestingly, |Meyer| [ |1978| | gave an algorithm for the DNFP which is essentially equivalent 
to Yaroslavskiy's partitioning method. Indeed, it even outperforms the algorithm proposed by 
Dijkstra [McMaster, 1978 1 ! Yet, the real advantage of Yaroslavskiy's partitioning scheme — the 
reduced expected number of key comparisons — is hidden by the atomic buck operation; its potential 
use in Quicksort went unnoticed. 

2. Preliminaries 

In this section, we recall elementary definitions and collect some notation and basic facts used 
throughout this paper. 

By J€ n := £" =1 Hi, we denote the rath Harmonic Number. We use 8ij for the Kronecker delta, 
which is defined to be 1 if i - j and otherwise. We define xln(x) = for x - 0, so that x >->■ xln(x) 
becomes a continuous function on [0,oo). 

The probability of an event E is denoted by P[E] and we write \e) for its indicator random 
variable, which is 1 if the event occurs and otherwise. For a random variable X, by E[X], Var(X) 
and 5£ (X) its expectation, variance and distribution are denoted, respectively. X - Y means that X 
has the same distribution as Y. 

By \\X\\ P := E[|X| p ] 1/p , 1 < p < oo, we denote the L p -norm of random variable X. For random 
variables Xi,X%, ■ ■ ■ and X, we say X n converges in L p to X 

X n ^ X iff lim||X„-X||„ = 0. 

The Bernoulli distribution with parameter p is written as B(p). Provided that Y. b r=1 Pr - 1 and 
b > 1 is a fixed integer, we denote by M(n;pi,...,Pb) the multinomial distribution with n trials 
and success probabilities pi,...,Pb £ [0,1]. For random probabilities V - (Vi,...,V&), i. e. random 
variables < V r < 1 (r = 1, . . . , b) with E* = i V r = 1 almost surely, we write 

Y = M(n;Vi,...,V b ) (2.1) 



Note that even if Sedgewick 1 1977 1 proposes to use one final run of Insertionsort over the entire input array, modern 
cache hierarchies suggest to immediately sort small subarrays as done in our implementation. 



2. Preliminaries 

to denote that Y conditional on V - v (i. e. conditional on (Vi, . . . ,Vb) - (v±, . . . ,vt,)) is multinomially 
M(n;vi,...,Vb) distributed. 

For k, r, b e N satisfying k<r + b, the hypergeometric distribution with k trials from r red and 
b black balls is denoted by HypG(£, r,r + b). Given an urn with r red and b black balls, it is the 
distribution of the number of red balls drawn when drawing k times without replacement. The 
mean and variance of a hypergeometrically HypG(& ,r,r + b) distributed random variable G are given 
by | |Kendall| [19451 p. 127] 

UG] = *.-b, Var(G) = , ^ * A ■ (2.2) 

r + b (r + b) z (r + b - 1) 

As for the multinomial distribution, given random parameters K, R in {0, ... , n] we use 

Y = HypG(K,R,n) (2.3) 

to denote that Y conditional on (K,R) = (k,r) is hypergeometrically HypG(&,r, n) distributed. 

2.1. Input Model 

We assume the random permutation model: The keys to be sorted are the integers l,...,n and each 
permutation of {l,...,n} has equal probability 1/re! to become the input. Note that we implicitly 
exclude the case of equal keys by that. 

As sorting is only concerned with the relative order of elements, not the key values themselves, 
we can equivalently assume keys to be i. i. d. real random variables. Equal keys do not occur almost 
surely and the ranks of the elements form in fact a random permutation of {l,...,n} again [e.g. 
Mah moud|200 0|. For the analysis of Section |4j this alternative point of view will be helpful. 



2.2. Basic Properties of Algorithm |T] 

As typical for divide and conquer algorithms, the analysis is based on setting up a recurrence 
relation for the costs. For such a recurrence to hold, it is vital that costs for subproblems of size k 
behave the same as the costs for dealing with an original random input of initial size k. For 
Quicksort, we require the following property: 

Property 1 (Randomness Preservation). 

If the whole array is a (uniformly chosen) random permutation of its elements, so are the subarrays 

Quicksort is recursively invoked on. 



|Henneqxun] fl989 1 showed that Property [T]is implied by the following property. 



Property 2 (Sufficient Condition for Randomness Preservation). 

Every key comparison involves a pivot element of the current partitioning step. 

Now, it is easy to verify that Yaroslavskiy's algorithm fulfills Property [2] and hence Property [Tj 

Since Yaroslavskiy's algorithm is an in-place sorting method, it modifies the array A over time. This 
dynamic component makes discussions inconvenient. Fortunately, a sharp look at the algorithm 
reveals the following fact, allowing a more static point of view: 

Fact 1. The array elements used in key comparisons have not been changed since the beginning of 
the current partitioning step. More precisely, if a key comparison involves an array element A[z], 
then there has not been a write access to A[i] in the current partitioning step. □ 



3. Average Case Analysis 

3. Average Case Analysis 

In this section, we assume that array A stores a random permutation of {1, . . . , n}. 

3.1. The Dual Pivot Quicksort Recurrence 

In this section, we obtain a general solution to the recurrence relation corresponding to dual pivot 
Quicksort. We denote by C„ the expected costs — where different cost measures will be inserted 
later — ofYaroslavskiy's algorithm on a random permutation of {l,...,re}. C n decomposes as 

C„ - costs of first partitioning step + costs for subproblems . (3.1) 

As Yaroslavskiy's algorithm satisfies Property [l] the costs for recursively sorting subarrays can be 
expressed in terms of C with smaller arguments, leading to a recurrence relation. Every (sorted) 
pair of elements has the same probability l/Q) of becoming pivots. Conditioning on the ranks 
of the pivots, this gives the following recursive form for the expected costs C„ of Yaroslavskiy's 
algorithm on a random permutation of size n: 

[T B + l/g) £ (C,_i + C,_,_i + C B _,) for«>M 
C n = < l<p<g<« , (3.2) 

[ Cf for n < M 

where C n s denotes the expected costs of iNSERTlONSoRTing a random permutation of {l,...,re} and 
T n is the expected cost contribution of the first partitioning step. This function T n quantifies the 
"toll" we have to pay for unfolding the recurrence once, therefore we will call T n the toll function of 
the recurrence. By adapting the toll function, we can use the same recurrence to describe different 
kinds of costs and we only need to derive a general solution to this single recurrence relation as 
provided by the following theorem. 



,, , ,, , .. , forn>M + 3. (3.3) 

M+3) (M+4\ (M+i\ ' 



Theorem 1. Let C n be recursively defined by ( |3.2| l. Then, C n satisfies 
Cn = ^t (j) £ (T J+ 2 ~ $Tj +1 + JLtj) 

[4! i=M+A j=M+2 ( 2 1 

(M+3\ _ (M+i\ (M+i\ 

i (n+1 , ' 4 1 i 5 ) j^ M-l fn+1 [ 5 ) \r, 

+ {~~5~ + m j L M+3 ~ MT3{~5 TEy}^M+2, 

As an immediate consequence, C n depends linearly on the toll function T n . 

The proof for Theorem [l] uses clever differences, details are deferred to Appendix|A| This general 
solution still involves non-trivial double sums. For the cost measures we are interested in, the 
following proposition gives an explicit solution for ( |3.2| l. 



Proposition 1. Let C n be recursively defined by ( |3.2| l and let T n -an + b for n>M+l. Then, C n 
satisfies 

C n = lain + l)[je n+1 - J? M+ 2) + \(n + l)(f a + *$=£) + &£ 

t * W-2k IS ( M 5 +4 ) ff forn>M + 3, (3.4) 

+ g(n + l)^ M+2 C k + -^-R M , 

where 

P - 6„ , 2(q-6) , 56-17a M-\ n , M-X n 

K-M ~ 5° + ~W^3~ + 2(M+4) MT4^M+3 + m^^M+2 ■ 

IfC 1 ^ - for all n, C n has the following asymptotic representation: 

C n = fanlnn + (§§a + W)re + falnra + {^§a-\b+W) + 0{\), n^oo, (3.5) 
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where 

W = Uar+^ 2 -a^ M+ 2) 

and j~ 0.57721 is the Euler-Mascheroni constant. 

Moreover, if the toll function T n has essentially the form given above, but with T2 = ^ 2a + 6, we 
get an additional summand -8mi • ^j(2a + b) -{n + 1) in ( |3.4| l. Equation ( |3.5| l remains valid if we set 

W=l{a r +^-aJ?M + 2-8 M i(2a + b)/10). 

The proof of Proposition [l] is basically "by computing", details again are deferred to Appendix [A] 

3.2. Basic Block Execution Frequencies 

In this section, we compute for every single instruction of Yaroslavskiy's algorithm how often 
it is executed in expectation. Based on that, we can easily derive the expected number of key 
comparisons, swaps, but also more detailed measures, such as the expected number of executed 
Bytecode instructions. This is the kind of analysis Knuth popularized through his book series The 
Art of Computer Programming IKnuth, 1998J. A corresponding analysis of classic single pivot 
Quicksort was done by Sedgewick [1977]. Like the Quicksort variant discussed there, Algorithm]!] 
uses Insertionsort for sorting small subarrays. A detailed implementation and corresponding 
analysis of Insertionsort is given in Appendix |B| 

Trivially, consecutive lines of purely sequentiajj code have the same execution frequencies. 
Contracting maximal blocks of such code yields the control flow graph (CFG) of an algorithm. 
Figure[l]shows the resulting CFG for Yaroslavskiy's algorithm. Simple flow conservation arguments 
allow to express execution frequencies of some blocks by the frequencies of others: The execution 
frequencies of the 20 basic blocks of Figure (Tlonry depend on the following nine frequencies: A, B, R, 
F, C (1) , C (3) , C (4) , «S (1) and «S <3) . The name C® indicates that this frequency counts executions of the 
ith location in the code of Algorithm [TJ where a key comparison is done. Similarly, S M corresponds 
to the ith swap location. 

The results derived in this section are summarized in Table[l]on page 11 and Table[2]on page 14 



The former lists the toll functions T n , for the frequencies. The latter gives the results of applying 
the general solution to ( |3.2| l. 



The expected execution frequencies allow a recursive representation of the following form, here 
using the example of C (1) : 

[E[Tco>(»)] + 1/g) E Kil + HC^il + BCi 1 ',]) for»>M 
EEC™] = I i< P <g<« , (3.6) 

[ for n < M 

where TqW - T&uin) is the frequency specific toll function — namely the corresponding frequency 
during the first partitioning step only. For the other frequencies, we similarly denote by Ta,Tf, Tc®), 
Tqw, TgiD and Tgo) the toll functions corresponding to A, F, C <3) , C (4) , S m and S <3) , respectively. 

In the rest of Section |3| we are only interested in expected costs, therefore we will sloppily use A, 
B, R, F, C (1) , C (3) , C (4) , S™ and S (3) to denote the expected execution frequencies. Later, in Section|i] 
where the distinction becomes important, we reserve A, B etc. for the actual random variables and 
explicitly write E[A], E[B] etc. for their expectations. 

The frequencies F, C (1) , C (3) , C (4) , <S (1) and «S (3) correspond to basic blocks in the body of the main 
partitioning loop, i. e. blocks [8]- 18 All these blocks have in common that they are not executed 



at all during calls with right - left < 1, i. e. when n < 2: In that case we have k > g directly after 
block [6] and hence immediately leave the partitioning loop from block [7] to block [19] Therefore, 

Purely sequential blocks contain neither (outgoing) jumps, nor targets for (incoming) jumps from other locations, except 
for the last and first instructions, respectively. 
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S (3) 
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19 



A[Ze/i] := AM; All] ■- p; 

Alright] ■- Alg]; Alg] := q; 

QUICKSORTYAROSLAVSKIY(A, left ,t-\). 

> QUICKSORTYAROSLAVSKIY(A, 1 + l,g - 1): 

- > QUICKSORTYAROSLAVSKIY(A,g+l,rig/i«): 



B-A 



lNSERTIONSORT(A,Ze/i;,rigA(); 



20 



.Return; 



> 777^ 

Figure 1: Control flow graph for Algorithm[T| The algorithm is decomposed into basic blocks of purely 
sequential code. Possible transitions from one block to another are indicated by arrows. Blocks 
with two outgoing arrows end with a conditional, the "yes" path is taken if the condition is 
fulfilled, otherwise the "no" transition is chosen. We refer to blocks using the number shown 
in the upper left corner. In the upper right corner, a block's symbolic execution frequency is 
given. For clarity of presentation, the recursive calls in blockQ9]are not explicitly shown, but 
only sketched by the dashed arrows. Block[2|calls InsertionSort given in AppendixlBJ 



3. Average Case Analysis 

we have 7>(2) = T C (i)(2) = T c m{2) = T c w(2) = T s m(2) = T s <a)(2) = 0. In the subsequent sections, we will 
determine the toll functions for n > 3. 

The following expectations are used several times below, so we collect them here. 
Lemma 1. E[p] = \(n + 1) and E[q] - §(re + 1). 
Proof: Conditioning on p and q, we find 

l<p<g<n I2J I2J 9=2 p = l 

A similar calculation for p proves the lemma. □ 

3.2.1. The Crossing Point Lemma 

The following lemma is the key to precise analysis of the execution frequencies that depend on how 
pointers k and g "cross". As the pointers are moved alternatingly towards each other, one of them 
will reach the crossing point first — waiting for the other to arrive. 

Lemma 2 (Crossing Point Lemma). Let A store a random permutation of {!,... ,n} with n>2. Then, 
Algorithm^leaves the outer loop of the first partitioning step with 

k = q + 8 = #+1 + 5 for 8 £{0,1}. (3.7) 



(More precisely, ( |3.7| l holds for the valuations ofk, g and q upon entrance of block 19) 



Moreover, 8-1 iff initially k[q] > q holds, where q - max{A[l], A[re]} is the large pivot. 

Proof of Lemma^j Between two consecutive "k < ^"-checks in block[7j we move k and g towards 
each other by at most one position each; so we always have k < g + 2 and we exit the loop as soon as 
k > g holds. Therefore, we always leave the loop with k-g+1+8 for some 8 e {0, 1}. In the end, q is 
moved to position g in block [19] Just above in the same block, g has been incremented, so we have 



g-q-1 upon entrance of block 19 



For the "moreover" part, we show both implications separately. Assume first that 8 = 1, i.e. the 
loop is left with a difference of 8+ 1 - 2 between k and g. This difference can only show up when 
both k is incremented and g is decremented in the last iteration. Hence, in this last iteration we 
must have gone from block [10] to [TT] and accordingly k[k] > q must have held there — and by Fact[T] 



k[k] still holds its initial value. 

In case k < n, even strict inequality k[k] > q holds since we then have k[k] jt k[n] - q by the 
assumption of distinct elements. Now assume towards a contradiction, k-n holds in the last 
execution of block 10 Since g is initialized in block[6]to right- 1-n-l and is only decremented in 



the loop, we have g <n-l. But this is a contradiction to the loop condition "k < g": n-k<g<n-l. 



So, k[k] > q for the last execution of block 10 



By assumption, 5=1, so k - q + 1 upon termination of the loop. As k has been incremented 
exactly once since the last test in block [10] we find A[^] > q there, as claimed. 

Now, assume conversely that initially k[q] > q holds. As g stops at q - 1 and is decremented in 
block[l7j we have g - q for the last execution of block 11. Using the assumption yields k[g] - k[q] > q, 
since by Fact[T] k[q] still holds its initial value. Thus, we take the transition to block [12] Execution 



then proceeds with block 14 otherwise we would enter block 11 again, contradicting the assumption 



that we just finished its last execution. The transition from block 12 to[14|is only taken if k > g - q. 



With the following decrement of g and increment of k , we leave the loop with k>g + 2, so 8 -1. D 

Corollary 1. For 8 from Lemma[2]holds E[S] = | and E[8\p,q] = ^f. 

Proof: We first compute the conditional expectation. As 8 e {0, 1}, we have E[8\p,q] = P[8 - 1 \p,q], 
so it suffices to compute this probability. Now by Lemma[2j we have P[8 - 1 \p,q] - P[k[q] > q\p,q\. 
We do a case distinction. 
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3. Average Case Analysis 



Toll function T A T F T c m T c m T c w T s m T s <& 

Expected value 1 g | ra_1 3™~1 h n ~h \ n ~T2 T2 n ~\ 

Special value for n = 2 no 

Table 1 : Expected values for the toll functions of execution frequencies that characterize the block 
execution frequencies of all blocks in AlgorithmR] 

• For q < re, Alq] is one of the non-pivot elements. (We have 1 < p < q < re.) Any of the n - 2 
non-pivot elements can take position k[q], and among those, n-q elements are strictly greater 
than q. This gives a probability of |^| for Alq] > q. 

• For q - n, q is the maximum of all elements in the list, so we cannot possibly have Alq] > q. 
This implies a probability of = ^|. 

By the law of total expectation, the unconditional expectation is given by: 

E[S] = £ P[pivots(p,q)]-E[8\p,q] = 1/g) £ ^ 

l<p<g<n l<p<q<n 



©(n-2)l ls ,?,j; ijzjr) Q(n-2) U ^ n 



§(« + !) re-|(re + l) x 



re-2 " 3 ' 

□ 



3.2.2. Frequency A 

The frequency A-A n equals the expected number of partitioning steps or equivalently the number 
of (recursive) calls with right - left > M when initially calling QuiCKSORTYAROSLAVSKIY(A, 1, re) 
with a random permutation stored in A. Therefore, the contribution T& of one partitioning step is 
TaM - 1. By Proposition [j] with T n = 1 and Cjp = 0, we obtain the closed form 

A* = g(ilf2)(" + 1 )-l + i( M 4 +1 )/(4)- (3.8) 

3.2.3. Frequency R 

By i? =i?„, we denote the expected number of calls to QuiCKSORTYAROSLAVSKIY including those 
directly passing control to INSERTIONSORT for short subarrays. Every partitioning step entails 
three additional recursive calls on sublists (see block |19|). Moreover, we have one additional initial 
call to the procedure. Together, this implies 

R n = 3A„ + 1. (3.9) 

3.2.4. Frequency B 

Frequency B counts how often we execute block [4] This block is reached at most once per partition- 
ing step, namely iff k[left] > Alright], For random permutations, the probability for that is exactly 
1/2, so we find 

B n = \A n . (3.10) 

3.2.5. Frequency C (1) 

C (1) (re) denotes the expected execution frequency of block[8]of Yaroslavskiy's algorithm. Block[8]is 
the first statement in the outer loop and the last block of this loop (block [18]) is the only place where 
k is incremented. Therefore, Tcm is the number of different values that k attains during the first 
partitioning step. The following corollary quantifies this number as Tcm -q-2 + 8. 
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3. Average Case Analysis 

Corollary 2. Let us denote by X the set of values pointer k attains at block [8] Similarly, let 'S be 
the set of values of g at block [Tl] We have 



X = {2,2,,..., q- 1 + 8], \X\ = q-2 + 8, 

<& = {n-l,n-2,...,q + l,q}, \<S\ = n-q. 

Proof: By Lemma [2] we leave the outer loop with k-q + 8 and g - q - 1. Since the last execution 
of block[8j k has been incremented exactly once (in block [18), so the last value of k, namely q + 8, is 
not observed at block [8} Similarly, after the last execution of block [TH we always pass block [17 



where g is decremented. So the last value q - 1 for g is not attained in block [11] □ 

Continuing with frequency C (1) , note that q and 8 and hence T^xt -q-2 + 8 are random variables. 
By linearity of the expectation E[Tc<i>] = E[g]-2 + E[fl] holds, so with Lemma[l]and Corollary[l] we 
find 

E[T c u)(re)] = \(n + l)-2+ \ = |n-l. (3.11) 

3.2.6. Frequency S (1) 

Frequency «S (1) corresponds to block [9] Block [9] is executed as often as block [8] is reached with 
A[k] < p. This number depends on the input permutation: Tsm(n) is exactly the number of elements 
smaller than p that happen to be located at positions in X, the range pointer k scans. Denote this 
quantity by s@X. 

Lemma 3. Conditional on the pivot valuesQp and q, s@Xis hypergeometrically HypG(p- l,q- 
2,n-2) distributed. 

Proof: This is seen by considering the following (imaginary) generation process of the current 
input permutation: Fixing the pivot values p and q, we have to generate a random permutation of 
the remaining re - 2 elements E := {1, . . . , re} \ {p, q). To do so, we first choose a random subset S of the 
free positions F := {2, . . . ,n - 1} with |S| = p - 1. Then we put a random permutation of {1, ... ,p - 1} 
into positions S and a random permutation of E \ {1, . . . ,p - 1} into positions F\S. It is easily checked 
that this generates all permutations of E with equal probability, if all choices are done uniformly. 

Then by definition, s@X = \Sr\Jf\. This seemingly innocent equation hides a subtle intricacy 
not to be overlooked: JT = [2, . . . , q - 1 + 8} (Corollary [2) is itself a random variable which depends 
on the permutation via 8. Luckily, the characterization of 8 from Lemma [2] allows to resolve this 
inter-dependence. X - [2, . . . , q] if k[q] > q and J£ - [2, . . . , q - 1} otherwise. Stated differently, we get 
the additional position q in ,X iff the element at that position is large, which means position q 
never contributes towards small elements at positions in ,X. As a result, s@Jf - s@,X' = \S n J£'\ 
for J£' = {2,.. . ,q - 1}, which is constant for fixed pivot values p and q. 

Drawing positions S for small elements one by one is then equivalent to choosing |S| balls out of 
an urn with re - 2 balls without replacement. If | JT '| of the re - 2 balls are red, then s@Jf equals the 
number of red balls drawn, which is hypergeometrically 

HypGK|S|,|J?r'|,re-2) = HypG(p - 1, q - 2, re - 2) 

distributed by definition. □ 

The mean of hypergeometric distributions from ( |2.2| l translates into the conditional expectation 
E[s@ ,X \p, q] =(p- l)(q-2)/(n-2). By the law of total expectation, we can compute the unconditional 
expected value: 

E[T s u>(re)] = E[s@JH = E iPtq) [E[s@X\p,ql] = l/g) £ (p ~n%~ 2) = \n-&. (3.12) 

l<p<g<ra 

Actually, it is the ranks of p and q that matter. Since for permutations of {!,..,, n} ranks and values coincide, we will not 
dwell on the difference here for conciseness of presentation. 
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3. Average Case Analysis 



3.2.7. Frequency C (3) 

Block |11| — whose executions are counted in C (3) — compares A[g] to q. After every execution of 
block |11| pointer g is decremented: depending on whether we leave the loop or not, either in 
block [13] or in block 17 Therefore, we execute block 11 for every value g attains at block [TT| which 
by Corollary [2] amounts to Tc(v(n) -\ c S\-n-q. Using Lemma[l] we find 



E[T c c»(ra)] 



(3.13) 



3.2.8. Frequency^ 



Frequency F counts how often we take the transition from block 12 to block 14 This transition is 
taken when we exit the inner loop of Yaroslavskiy's algorithm because the second part of its loop 
condition, "k < g", is violated, which means we had k > g. 



After this has happened, we always execute blocks [17] and [18} where we decrement g and 
increment k. Moreover by Lemma[2] k is at most g + 2 after the loop, and equality holds iff 8-1. So 
at block[l2] we always have k < g, which means the violation of the loop condition occurs for k - g 
and can only happen in case 8-1. 

We can also show that it must happen whenever 5=1: By Lemma [2] we have A[q] > q, and 
k-q+1- g+2 after the loop. Therefore, during the last iteration of the loop, g-k-q and hence 
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k[k] - k[g] - k[q] > q holds. As a consequence, execution always proceeds through blocks [8_ 
and 11 to block [12] There, "k < g" is not fulfilled, so we take the transition to block [14] Together, we 
obtain Tp-8 and Corollary [l] gives 



E[T F (n)l 



(3.14) 



3.2.9. Frequency C (4) 



Frequency C (4) corresponds to block 14 which compares A[g] to p. From the control flow graph, it 
is obvious that C (4) is the sum of the frequencies of the two incoming transitions, namely block [TT] 



to 14 and block 12 to 14 The latter is exactly F. 



For the former, recall from above that block[TT]is executed once for all values <& = {n-l,n-2,...,q] 
that pointer g attains there. The transition from block 11 to block 14 is taken iff A[g] <q. As 
1 < g < n holds and all elements are distinct, A[g] = p cannot occur. Therefore, exactly the small and 
medium elements that are located at positions in ^ cause this transition; denote their number by 
sm©^. A very similar argument as in the proof of Lemma [3] shows that conditional on p and q, 
sm©^ is hypergeometrically HypG(^ - 2, n - q, n - 2) distributed. 

Adding both contributions yields Tew - S + sm© 1 ^. Using Corollary [l] and equation ( |2.2| l shows 



E[Tcw>(n)] = | + 1/g) £ 



(q-2)(n-q) 



= c-n- 



l<p<<7<rc 



(3.15) 



3.2.10. Frequency S <3) 

Frequency «S (3) counts executions of block[l6] Key to its analysis are the following two observations: 

1. Block[l6]and block[9](with frequency «S (1) ) are the only locations inside the loop where pointer 
£ is changed. Therefore, 7g(u + Tgo) = \5£\ - 1, where !£ is the set of values pointer I attains 
inside the loop (minus one as we leave the loop after the last increment of £ without executing 
blocks [9] and [16] again) . 



2. In block[19] we move the small pivot to k[£], so £ - p must hold there. Just above the swap, £ 
is decremented, so the last value of/ in the loop has been p + 1. Moreover, £ is initialized to 2 
(block[6), so Se = {2, . . „p + 1}. 
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3. Average Case Analysis 



Frequency M = 1 (exact solution for n > 4) M > 2 (asymptotic with error term 0{\)) 



A 


h~ 


i 

TO 




B 


h~ 


1 
20 




R 


g» + 


7 
10 




F 


lV" 


1 
" 15 




C (l) 


| (71 + l)^f„ - 


50 n 


2 
75 


C (3) 


| (71 + l)^f„ - 


~ 1« + 


1 
50 


C (4) 


£ (» + l)^f„ - 


39 n 
TOO™ 


7 
300 


S (l) 


^(n,+ l)je n - 


127 
21)0" 


1 
600 


S (3) 


±(n + l)je n - 


200 ™ + 


13 
600 



3 ,(77 + 1)- 1 



5(M+2) 



5(M+2) V " ' *-' 4 
5(M+2)^ + 1 ^~ 2 

g(M+2)(rc+l)-g 



|(77 + l)(^P„ +1 -^k +2 ) + ( i-j^ )(»+«+ 1 

§ (77 + 1)(^P B+1 - ^ M+2 ) + ( % - 5^ )(„ + 1) + | 

l(n + l)(^ + i-^ M+2 ) + (^j-5^)(n+l)+| 

TO (77 + l)(^f„+i -^°M+2) + (too " 5 (M+2) ) ( " + 1)+ 3 
^(77 + l)[je n+1 -je M+2 ) + (300 - 5(M+2) ) ( " + 1)+ 6 



Table 2: Expected execution frequencies characterizing all block execution frequencies of Figure [T| 
Those immediately follow from Proposition [T] and the toll functions of Table[T| For M= 1, we 
give exact expectations (valid for 77 > 4), for M > 2 we confine ourselves to (extremely precise) 
asymptotics. Note that exact expectations can be computed using equation \3A) if needed. 



Together, this implies Tgo) =p - 1 - s@.X and by ( |3.12| l and Lemma[T} 

E[T s <3)(7i)] = E[p] - 1 - E[s@jn = |(» + l)-l-(|n-&) = H»~i- (316) 

3.3. Key Comparisons 

Theorem 2. Jrc expectation, Yaroslavskiy's algorithm (Algorithm^ uses 

f ^(77 + l)(^f„ + l-^ M+2 ) 

for M > 2 

C„ = +(W + ^-5TM-5W2)^M + i)(» + l)+|+0(i) (3.17) 

l^(» + lW?»-§5»-^ /brM=l,7i>4 

&ey comparisons to sort a random permutation of size n. 



Proof: Key comparisons in the partitioning loop happen in basic blocks [3] [8j [lOj [IT] and 14 
Together this amounts to 

C^ = C^ + (C^-S^) + Cf + Cf + A n 



19 
10 

19 5 (n + l)^f„- gjn- ^ forM=l,n>4 



(77 + l)(J*f B+ i - ^k +2 ) + (fro - 57jraj)(» + 1) + I + O(^) for M > 2 



comparisons in expectation, where the second equation follows by summing the results from 

Table H 

For M > 2, we get additional comparisons from INSERTIONSORT, see Appendix [B] for details: 

C« S) = E n+ D n = {l (M + 3) +w ^- ) - m ^ ) ^ M+ i)(n + l). 

Summing both contributions yields ( |3.17| l. D 
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3. Average Case Analysis 



3.4. Swaps & Write Accesses 

Theorem 3. In expectation, Yaroslavskiy's algorithm performs 



S n 



(re + V)[je n+1 - je M+2 ) + (±§ + gp^2j)(/i + 1) - | + 0(£) for M > 2 



Un + \)J€ n 



100 ' 



61 
300 



for M = 1, re > 4 



(3.18) 



swaps in partitioning steps while sorting a random permutation of size re. 

Including the ones done in INSERTIONS ORT on small subarrays, Yaroslavskiy's algorithm uses 



W n 



Yq(« + l)(<^f„+i - &M+2) 



forM>2 



+ V75 + 20 m + 5(M+1) 5(M+2)) Kn + 1> 6 + U l^J 

±±(re + l)^„-±§re-fg forM=l,n>4 

write accesses to the array to sort a random permutation. 



(3.19) 



Proof: We find swaps in the partitioning loop of Yaroslavskiy's algorithm in basic blocks [9] [15} 
16 and 19 where blocks 16 and 19 each contain two swapsjj Hence, the total number of swaps 
during all partitioning steps is given by 



g(QS) 



j(l) 



+ (C 



(4). 



;(3) 



S w ) + 2S W + 2A . 



(3) 



Now, ( |3.18| l follows by inserting the terms from Table [2] 

A clever implementation realizes the two consecutive swaps in block 16 with only three write 
operations, see for example Appendix [C] This yields an overall number of 

W ( ® S) = 2S™ + 2{C { ^-Sf) + 3S™ + 4A n 

_ |ii(« + l)(^f n+ i-^M + 2) + (in + ] ^2)(« + l)-| + 0(^) forM>2 

^(n + lX^-i^n-li forM=l, re>4 
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write operations during the partitioning steps. The contribution from INSERTIONSORT is (cf. 
Appendix [B): 



AIS) 



18 



36 



Wr = E n + (G n -I n ) = (|(M + 3) +5(M+1) 5(M+2) 



)(re + l). 



n 



Adding both together we obtain ( |3.19| l. 

3.5. Executed Java Bytecode Instructions 

Theorem 4. In expectation, the Java implementation of Yaroslavskiy's algorithm given in Ap- 
pendix \C\executes 



?§(n + Y)[je n+1 -je M+ 2) 



M>2 



BC n = { 



, f 4259 51 t/r 72 317 48 qt> V*, _i_ 11 181 , fif 1 1 

+ IT50" + 20^ + MTT " WMT2) ~ 5(MT2)'^M+l)(n + 1) - T2 - + U[-^j 

^1 (-n ±1\3P 1993 „ 2009 „ -> /( M-1 

-jQ-(n + i)<J? n - -goo"' 1 - "goo" re > 4, M - 1 

Java Bytecode instructions to sort a random permutation of size re. 



(3.20) 



Note that Wild and Nebel 12012] included an additional contribution of B for swapping the pivots if they are out of order. 
However, this swap can be done with local variables only, we do not need to write the swapped values back to the array. 
Therefore, this swap is not counted in this paper. 
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3. Average Case Analysis 
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Figure 2: The expected number of executed Bytecodes for InsertionSort and Yaroslavskiy's algo- 
rithm with different choices for M. The numbers of Bytecodes shown are normalized by n, i. e. 
we show the number of executed Bytecode instructions per element to be sorted. The data 
was obtained by nai'vly evaluating the recurrence. 

Proof: By counting the number of Bytecode instructions in each basic block and multiplying it 
with this block's frequency, we obtain: 

BC n = 71A -1B + 6R + 15C (1) + 10C (3) + 11C (4) + 9S (1) + 8S (3) + 3F + 4D + HE + 20G - II . 

For details, see Appendix[Cj Inserting the expectations from Table [2] results in ( |3.20) . 

The Bytecode count for M - 1 corresponds to entirely removing INSERTIONSORT from the code. 
InsertionSort would execute 13 Bytecodes even on empty and one-element lists until it finds out 
that the list is already sorted. These obviously superfluous instructions are removed for M = 1. □ 

3.6. Optimal Choice for M 

The overall linear term for the expected number of comparisons is 

(-124 3 w 37 62+19M rag V„ . 11 

I 75 + 20 JK/ 10(M+2) W(M+2)^ CM+1 ) Kn ~ l ~ *■' ■ 

This coefficient has a proper minimum at M - 5 with value -3.62024 Compared with -711/200 = 

-3.555 for M - 1 this a minor improvement though. 

For the number of write operations, the linear term is 

26 . 18 



86 , iu. 
75 T 20^ 



■ + 



5(M+2) ' 5(M+1) 10 



&#k+2)(» + l). 



The minimum -1.0983 of this coefficient is also located at M - 5. The corresponding coefficient for 
M - 1 is -0.695. This improvement is more satisfying than the one for comparisons. 

The linear term for the number of executed Bytecodes of Yaroslavskiy's algorithm with M > 2 
attains its minimum -16.0887... at M - 7. This is a significant reduction over -9.965..., the linear 
term without iNSERTIONSORTing. Figure [2] shows the resulting expected number of Bytecodes 
for small lists. For n < 20, using INSERTIONSORT results in an improvement of over 10%. For 
n = 100 we save 6.3%, for n = 1000 it is 4.2% and for n = 10000, we still execute 3.1% less Bytecode 
instructions than the basic version of Yaroslavskiy's algorithm. 

It is interesting to see that both elementary operations favor M - 5, but the overall Bytecode 
count is minimized for "much" larger M - 7. This shows that focusing on elementary operations can 
skew the view of an algorithm's performance. Only explicitly taking the overhead of partitioning 
into account reveals that INSERTIONSORT is significantly faster on short subarrays. 
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4. Distribution of Costs 

Remark The actual Java 7 runtime library implementation uses M - 46, which seems far from 
optimal at first sight. Note however that the implementation uses the more elaborate pivot 
selection scheme tertiles of five I Wild e t al.||2013} , which implies additional constant overhead per 



partitioning step. 

4. Distribution of Costs 

In this section we study the asymptotic distributions of our cost measures. We derive limit laws 
after normalization and identify the order of variances and covariances. In particular, we find that 
all costs are asymptotically concentrated around their mean. 

As we confine ourselves to asymptotic statements of first order (leading terms in the expansions 
of variances and covariances), it turns out that the choice of M does not affect the results of this 
section: All results hold for any (constant) M (see [Neininger] 2001| proof of Corollary 5.5] for 
similar universal behavior of standard Quicksort). 

4.1. The Contraction Method 

Our tool to identify asymptotic variances, correlations and limit laws is the contraction method. 
For the reader's convenience we formulate a general convergence theorem from the contraction 
method that is used repeatedly below and sufficient for our purpose. Let (X„)„>o denote a sequence 
of centered and square integrable random variables either in U or R 2 whose distributions satisfy 
the recurrence 

X n i f,Ai n) X% + b M , re>re , (4.1) 

where random variables (A ( f ) ,...,A%\b M ,I M ) and (X^ ) ) n ^,...,(Xf ) ) n ^ are independent, andX[ r) 
is distributed as Xi for all r - 1, . . . ,K and i > 0. Furthermore, 7 (n) = (I^\ ..., 1^) is a vector of random 
integers in {0, ...,n- 1} and K and reo are fixed integers. 

The coefficients Ay* and b M are real random variables in the univariate case, respectively 
random 2x2 matrices and a 2-dimensional random vector in the bivariate case. We assume also 
that the coefficients are square integrable and that the following conditions hold: 

(A) (A« AWjW) JL (A 1 ,...,A k ,b), 

(B) lf =1 E[||A*A r || op ] < 1, 

(C) if^Ellj/w^lKA^A^Ilop] - as re- oo for all constants £ > 0. 
Here ||A|| op := sup| W | =1 \\Ax\\ denotes the operator norm of a matrix and A 1 the transposed matrix. 



Note that in the univariate case we just have ||A*A r || op = A 2 . In (A) we denote by — ^ conver 



h 



gence in the Wasserstein-metric of order 2 which here is equivalent to the existence of vectors 
(A ( 1 " ) ,...,A^ ) ,5 <n) ) with distribution of (A ( 1 " ) ,...,A^ ) ,6 (,l) ) such that we have the L 2 convergence 

(A[ n \...,A%\b M ) ii (A 1 ,...,A k ,b). 

Note in particular that Ai,...,Aj,fe are square integrable as well. Then we consider distributions 
of X such that 

X = ^A r X ir) + b, (4.2) 

where (Ai, . . . , Ak, b), X (1 \ . . .,X (K) are independent and X {r) are distributed as X for r - 1, . . . ,K. The 
following two results from the contraction method are used: 
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(I) Under (B) among all centered, square integrable distributions there is a unique solution 



££{X) to (gg). 



(II) Assuming (A) (B)and(C) the sequence (X n ) n >o converges in distribution to the solution ££{X) 



from (I) The convergence holds as well for the second (mixed) moments of X n . 



These results can be found in Rosier [Rosier 2001 Theorem 3] for the univariate case and Neininger 



[Neininger, 2001, Theorem 4.1] for the multivariate case 



4.2. Asymptotics of Mixed Distributions 



Checking condition |(A)| later on will involve the asymptotic behavior of mixed distributions. We 
provide the required convergence results as technical lemmas here: 

Lemma 4. Let (V\,...,Vb) be a vector of random probabilities, i.e. 0<V r <l for all r-l,...,b and 
L*=i Vr - 1 almost surely. Let 

(Li,...,L 6 ) := (L[ n \...,L[ n) ) i M(n;V 1 ,...,V b ), (4.3) 

be mixed multinomially distributed. Furthermore for J\, J% c {1, .. . , 6} let 

Z n t HypG( £ Lj, £ L J> n ) (4-4) 

be mixed hypergeometrically distributed. Then we have the L<i-convergence, as n^oo, 

^n L2 

V E ^1 ' V E ^2 



7^(1^4 (4 " 5) 



The proof is deferred to Appendix [D] For the following lemma recall that we set xln(x) := for 
x = 0. 

Lemma 5. For L\,...,L\, from Lemma [3] we have for l<i<b the L2-convergence 

^ln(^) it V t \n(Vi) as n~ 00. (4.6) 

The proof can be found in Appendix |D| 

4.3. Distributional Analysis of Yaroslavskiy's Algorithm 

We come back to Yaroslavskiy's algorithm (Algorithm [lj. For the analysis here, it proves more 
convenient to consider i. i. d. uniformly on [0, 1] distributed random variables U\,...,U n as input. As 
the actual element values do not matter, this is the same as considering a random permutation of 
{1, . . . , n}. Moreover, note that U\,...,U n are pairwise different almost surely. 

Yaroslavskiy's algorithm chooses Ui and U n as pivot elements. Denote by D = {D\,D2,Ds) the 
spacings induced by U\ and U n on the interval [0,1]; formally we have 

(D 1 ,D 2 ,D 3 ) = (£/(i), £7(2) -£/(i),l-£/ ( 2)), 

for £/(u :=min{ U\,U n } and U@) :=max{£/i,[/„} . 



It is well-known that D is uniformly distributed in the standard 2-simplex [David and Nagaraja, 
20031 P- 133f], i- e. (D lt D 2 ) has density 



1 2 for xi,X2 > a X1 + X2 < 1 
/z)(xi,x 2 ) = < 

otherwise 
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Quantity Distribution given I = (7i , I2 , 1 3 ) 



p 


= 


Ji + 1 






Q 


= 


n-I 3 






6 


= 


\A[Q]>Q) 


d 


H&) 


T A 


= 


1 






T B 


= 


^{AUeft^Alright]) 


d 


B(|) 


Tf 


= 


8 


d 


Hw*z) 


T c m 


= 


Q-2 + 8 


d 


/l + /2 + B(^) 


Tern 


= 


n-Q 


d 


h 


T c w 


= 


S + sm®^ 


d 


B{^)+B. TP QiIi + h,h,n-2) 


T s m 


= 


s@X 


d 


HypG(/i,/i+7 2) «-2) 


Tgm 


= 


P-l-s@JT 


d 


I 1 -Byj,Gih,I 1 + I 2 ,n-2) 



Table 3: Exact distributions of the toll functions introduced in Section |3.2| or equivalently the distributions 
of block execution frequencies in the first partitioning step. 

D 3 - I-D1-D2 is fully determined by (D 1 ,D 2 ). Hence, for a measurable function g:[0,l] 3 ->■ R such 
that g(Di,D2,Da) is integrable, we have that 

nl-x 1 
g{xi,X2,l-x 1 -X2)dx2dxi. (4.7) 

J 

Further we denote the sizes of the three subarrays generated in the first partitioning phase by 
jM _ (jW) j(n) j(n)) t Then we have jin) + j(n) + j(n) = n _ 2 , and conditional on D the vector I (n) has a 
multinomial distribution: 

I (n) i M{n-2;D 1 ,D 2 ,D Z ). 

We will use the short notation I M - 1 - (I1J2J3) when the dependence on n is obvious. From the 
strong law of large numbers and dominated convergence we have in particular for r e {1,2,3} 

jM L 

— -^ D r (re -00), l<p<oo. (4.8) 

re 

The advantage of this random model is that we can decouple values from ranks of pivots. With 
Z>i and D 1 + D 2 , we choose the values of p and q; however, the ranks are not yet fixed. Therefore 
given fixed pivot values, we can still independently draw non-pivot elements (with probabilities 
D\, D2 and D3 to become small, medium and large, resp.), without having to fuzz with a priori 
restrictions on the overall number of small, medium and large elements. This makes it much 
easier to compute cost contributions uniformly in pivot values than in pivot ranks. If we operate on 
random permutations of {1, . . . , re}, values and ranks coincide. So fixing pivot values there implies 
strict bounds on the number of small, medium and large elements. 

4.3.1. Distribution of Toll Functions 



In Section 3.2 we determined for each basic block of Yaroslavskiy's algorithm, how often it is 
executed in a single partitioning step. There, we only used the expected values, but the discussions 
actually characterize the full distributions. 

Most of those distributions are in fact mixed distributions, i. e. their parameters depend on the 
random variable I - (Ii,l2,h), namely the sizes of the subarrays for recursive calls. For example, 
we find that 8 conditional on the event {Ii,l2,Iz) - (Ji, J2, J3) is Bernoulli B{i 3 l{n -2)) distributed, 
which we briefly write as 8 - B(I 3 /(n - 2)). The distributions are summarized in Table[3]for later 
reference. 
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4.3.2. Key Comparisons 



We have the following asymptotic results on the variance and distribution of the number of key 
comparisons of Yaroslavskiy's algorithm: 

Theorem 5. For the number C n of key comparisons used by Yaroslavskiy's Quicksort when operating 
on a uniformly at random distributed permutation we have 

->■ C , (n-^oo), (4.9) 

n 

where the convergence is in distribution and with second moments. The distribution of C* is 
determined as the unique fixed point, subject to E[X] = and E[X 2 ] < oo, of 

, 3 

X i 1 + (D 1+ D2)(D2 + 2D 3 ) + £ [DjX^+^DjlnDj) , (4.10) 

.7=1 

where (D\,D2,D 3 ), X (1) , X {2) and X (3) are independent and X^ has the same distribution as X for 
j £ {1,2,3}. Moreover, we have, as n^oo, 

Var(C„) ~ a 2 c n 2 with a\ = ^ - §§tt 2 = 0.25901... (4.11) 

Proof: Consider the first partitioning step of Yaroslavskiy's algorithm and denote by Tc(n) the 
number of key comparisons of the first partitioning phase. By Property [l] subarrays generated in 
the first partitioning phase are, conditional on their sizes, again uniformly random permutations 
and independent of each other. Hence, we obtain the distributional recurrence 

C n = C' h + C" H + C'l + T c (n) (n > 3), (4. 12) 

where (I 1 ,I 2 ,Ia,T c (n)), (C'j)j> , (C'pj> , (Cj')j> are independent and C' p C" p C'f are identically 
distributed as Cj for j > 0. By Theorem [21 the expectation has the form E[C n ] = 19/10 n In n + cn + 
0(log«) for some constant c e U. With Cq := and 

. = C„-E[C„] (re£l) (4 _ 13) 



we have a sequence (C*) n >o of centered, square integrable random variables. Using < 4.12) > we find, 
cf. | |Hwang and Neininger| [20021 eq. (27), (28)], that (C* n ) n > satisfies <|4j) with 



I i . 3 

Af = -, b (n) = - \Tc(n)-E[C n ]+ £E[C/ r |/ r ] . 

n n V ~i ' 



We apply the framework of the contraction method outlined in Section 4.1 To check condition (A) 
note that from ( |4.8| ), we have A^ -* D r in L 2 for r = 1,2,3 as n —> oo. To identify the L2-limit of 
b (n) we look at the summands Tc(n)/n and (-E[C„] + Y? r=1 E\Ci r \I r ])/n separately. The expansion 
E[C„] = 19/10 n\nn + cn + o{n) implies E[C/ r 1 7 r ] = 19/10 I r lnl r + cI r + o(n) since we have o(I r ) - o(n). 
Plugging in these expansions, using I\ + I 2 + 1 3 = n - 2 and rearranging terms gives the asymptotic 
identity, as n — ► 00, 

-Wj + £E[C /r |/ r ]) = £§£ln£ " fi^-f +o(D. 

1 v r=l ' r=l 

Hence, Lemma [5] implies 

-[-E[CJ+fE[C /r |/ r ]) -- ^f^DjlnDj (»-oo). (4.14) 

" V r=l 7=1 
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For the limit behavior of Tc(n)/n we use the distributions listed in Tableland find 

T c (n) = T c m(n) + [T c m(n) - T s m(n)) + T c m{n) + T c Mn) + T A (n) 

= n-1 + h + I 2 + HypG(7i + 7 2 ,/3,«-2) - HypG(/i,7i + 7 2 ,«-2) + 3B(^). 

Using Lemma|4]and ( |4.8| >, we find for the normalized number of comparisons: 

T c (n) g n-2 h h HypG(7i + 7 2 ,/3,»-2) n-2 
n n n n n-2 n 

HypG(7 1 ,7 1 +7 2 ,»-2) n-2 M^) 
n-2 n n 

-^ 1 + £>! + D 2 + (D 1 +D 2 )D^ - D 1 {D 1 +D 2 ) + 
= l + (Z>i+Z> 2 )(l+Z>3-T>i) 
= l + (7) 1 +7) 2 )(7) 2 + 2Z>3). 



Altogether, we obtain that condition |(A) | holds with 

3 

E 



(Ai,A a ,A 3 ,6) = [D 1 ,D 2 ,D 3 ,l + (D 1 +D 2 )(D2 + 2D 3 )+^Y i Dj\nDj). 



For the verification o f|(B)| note that D\, 7) 2 and D3 are identically distributed with density x — 2(l-x) 
for < x < 1. This implies 

3 ^.1 

Ee[Z> 2 ] = 3E[Df] = 3/ x 2 2(l-x)dx = §<1. (4.15) 



Moreover, condition (C) is fulfilled since 



XEflf/M^HKA^VA^Ilop] S EP(7< n) <^) - 0, (n-oo), for any fixed * e N . 

r=l r=l 



Now the conclusions p)|and |(H)| give the claims C* -» C* in distribution with the characterization 
of the distribution of C*. For the asymptotic of the variance note that convergence of the second 
moment E[(C*) 2 ] — E[(C*) 2 ] and the normalization ( |4-13[ > imply 

Var(C„) ~ <4?i 2 with 0% = E[(C*) 2 ]. 

To identify o 2 c , let C* ,a) ,C* ,(2) and C*' (3) be independent copies of C* also independent of (7>i,Z> 2 ,7)3). 
We abbreviate t := l + (7)i+7) 2 )(7) 2 + 27)3)+ ^X^ =1 7) r ln7) r . Taking squares and expectations in ( |4.10| l 
and noting that E[C*] = E[C*' (r) ] = 0, we find 

3 



E[(C*) 2 ] = E[(l + (7) 1 + 7) 2 )(7) 2 + 27)3)+^^7) r ln7) r + ^7) r C*' (r) ) ] 

r=l r=l 

= E[t 2 ] + 2E[T£Z> r C*' (r) ] + E[(^D r C*' (r) ) 2 ] 



r=l r=l 

3 3 

= E[t 2 ] + 2£E[TZ) r ]E[C*] + £e[Z> 2 ]E[(C*) 2 ] + £ E[2? r £ s ]E[C*] 2 

r=l r=l l<,r,s<3 

3 r ^ S 

= E[t 2 ] + E[(C*) 2 ] E E ^ 2 ] 

r=l 

= E[t 2 ]+±E[(C*) 2 ], 
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where for the last equality ( |4.15| l was used. Solving for E[(C*) ] implies 

3 



El(C*) 2 l = 2E\[l + (D 1 + D2)(D2+2D 3 )+^Y, D J lnD j) ]• 

jf=l 



Now, the integral representation ( |4.7| l and the use of a computer algebra system yields the expres- 
sion for a 2 c . □ 

4.3.3. Swaps 

For the number of swaps in Yaroslavskiy's algorithm we have the following asymptotic behavior of 
variance and distribution. 

Theorem 6. For the number S n of swaps used by Yaroslavskiy's algorithm when operating on a 
random permutation we have 

S n -E[S n ] 

->■ S , (n^oo), (4.16) 

n 

where the convergence is in distribution and with second moments. The distribution of S* is 
determined as the unique fixed point, subject to E[X] = and E[X 2 ] < op, of 

, 3 

X I D 1 + (D 1+ D 2 )D 3 + £ [DjX^+^DjlnDj) , (4.17) 

y=l 

where (Di,D2,Ds), X w , X {2) and X (3) are independent and X (y) has the same distribution as X for 
j £ {1,2,3}. Moreover, we have, as n —> oo, 

Var(S„) ~ a 2 s n 2 , with a 2 s = jq-^ti 2 = 0.10782... (4.18) 

Proof: The proof can be done similarly to the one for Theorem [5] We have the recurrence 

S n = S' h + S' l2 + S'l' 3 + T s (n), (n > 3), (4. 19) 

with conditions on independence and distributions as in ( |4.12| l, where Ts(n) is the number of swaps 
in the first partitioning step of the algorithm. We set Sq := and 

_ S n -E[S n ] ^ (wai) (42Q) 

n 



Hence, (S*)„>o is a sequence of centered, square integrable random variables satisfying \A.1\ with 

A (n) = II^ b M = I ( r (n) _ E[S „] + £ E[S/ r I /,]) , 



where by Theorem pi we know E[S„] = |raln(ra) + c'« + 0(logra) for a constant c' £ K. Analogously 
to ( |4.14P we obtain 

-(-E[S n ]+£E[S /r |/ r ]) - f^Dj-lnDy, (n-oo), in L 2 . (4.21) 

W r=l jf=l 

It remains to study the asymptotic behavior o{Ts(n)/n. Again profiting from the spadework of 



Section [372] we find the exact distribution of the number of swaps: 

T s (n) = T s m(n) + (T c w(n) - T s w(n)) + 2T s w(n) + 2T A (n) = I 1 + sm@ ( S + S + 2 

= h + KypG(h + I 2 ,h,n-2) + B(^) + 2. 

By Lemma [4] and ( |4.8| l, we find 

Ts(rc) l 2 _ ,_ _ ._ 
► Di + (Di+D 2 )D3- 



The conditions (A)[ (B) and (C) are now checked as in the proof of Theorem [5] The assertions of 



Theorem p] follow from (I) and (II) identification of ct| analogously to a 2 c in Theorem pi □ 
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4.3.4. Executed Bytecode Instructions 

For the number of executed Java Bytecode instructions in Yaroslavskiy's algorithm we have the 
following asymptotic variance and distribution. 

Theorem 7. For the number BC n of executed Java Bytecodes used by Yaroslavskiy's algorithm 
when sorting a random permutation, we have 

BC n -E[BC n ] „ 

->■ BC , (n^oo), (4.22) 

n 

where the convergence is in distribution and with second moments. The distribution of BC* is 
determined as the unique fixed point, subject to E[X] - and E[X 2 ] < oo, of 

, 3 

X = 24 + (D 3 -9)D 2 -2D 3 (5D 3 + 2) + £ {DjX (j) + ^DjlnDj) , (4.23) 

where (Di,D 2 ,D 3 ), X (1 \ X i2) and X (3) are independent and X^ has the same distribution as X for 
j e {1,2,3}. Moreover, we have, as n -* oo, 

V & r{BC n ) ~ ol c n\ with o\ c = ±^§^ - ^TJ 2 = 42.0742. . . (4.24) 

Proof: The proof can be done very similarly as for Theorems [5] and [6] We only present the key 
points where changes are needed. For the distributional recurrence, we here have the toll function 

T BC (n) = 15T c m(n) + 10T c w(n) + UT c <«(n) + 9T s m(n) + 8T S <3)(«) 
+ 71T A (n) - lT B (n) + 6T R (n) + 3T F (n) . 

All contributions from the second line are bounded by 0(1) (see Table [5]>. Therefore they vanish in 
the limit oiTscMIn: 

TM£^± iS lb{D 1 +D 2 )+lW 3 + ll{D 1 + D 2 )D 3 + W 1 (D 1 +D 2 ) + S{D 1 -D 1 {D 1 +D 2 )) 
n 

= 24 + (D 3 -9)D 2 -2D 3 (5D 3 +2). 
The rest of the proof is carried out along the same lines as for the proofs above. □ 

4.3.5. Covariance between comparisons and swaps 

In this section we study the asymptotic covariance Cov(C n ,S n ) between the number of key compar- 
isons C n and the number of swaps S n in Yaroslavskiy's algorithm. 

Theorem 8. For the number C n of key comparisons and the number S n of swaps used by 
Yaroslavskiy's algorithm on a random permutation, we have for n^oo 

Cov(C n ,S n ) ~ a c ,sn 2 with a c ,s = ff - j^ 2 = -0.00855817... (4.25) 

The correlation coefficient of C n and S n consequently is 

p = ^g"^L_ „ -0.0512112... 

(C„ 



Proof: We define the column vector Y n := ($). Then from ( |4.12| l and ( |4.19| ), we obtain 

x Vl" x , 

[TsMj 



Y n i Y[ i+ Y^ + Y^ + {T(in) 
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with conditions on independence and distributions as in ( |4.12| ). We set Y * := ( ) and 

Y* := -(Y„-E[YJ). 



(4.26) 



Hence, (Y* )„> is a sequence of centered, square integrable, bivariate random variables satisfy- 
ing ( |4.1| l with 



i(n) 



6 



(/l) _ l[T c (n)-E[C B ] + Ej =1 E[C/ P |7 r ] 



Using the asymptotic behavior from the proofs of Theorems [5] and [6] we obtain that condition (A) 
holds with 



Or 







D r 



1 + (D 1 +D 2 )(D 2 + 2D 3 ) + 
D 1 +(D 1 + D 2 )D 3 + 



LUDjhiDj 
AnD 



5 <~j=\ u J 111±J J 



(4.27) 



Condition |(B)| is satisfied as 



3 3 

£E[||A r A r || op ] = £ELD 2 ] = 2 < I- 

r=l r=l 



Condition (C) is checked similarly as in the proof of Theorem [5] 

Hence from [(I)] we obtain the existence of a centered, square integrable, bivariate distribution 



i?(Ai, A2) that solves the bivariate fixed-point equation fl4.2| l with the choices for A r and b given 
in 04.27E . Furthermore (II) implies that the sequence (Y*) defined in ( |4.26| l converges in distribution 
and with mixed second moments towards (Ai, A2). This implies in particular, as n -> 00, 



C„-E[C„] S n -E[S„] 



E[A r A 2 ]. 



Hence, we obtain 



Cov(C„,S„) ~ E[A!A 2 ]« 2 . 



The value EIA1A2] is obtained from the fixed-point equation ( |4.2[ i with the choices for A r and b 
given in ( |4.27| l by multiplying the components on left and right hand side, taking expectations 
and solving for EIA1A2]. The integral representation ( |4- 7| > then leads to the expression given in 
d425) . D 

Note that Theorem [8] and its proof directly imply the asymptotic variance and limit distribution 
of all linear combinations aC n + f5S n , for a, (5 e U, which are, for a, /3 > 0, natural cost measures when 
weighting key comparisons against swaps. The reason is that in the proof of Theorem [8] we show 
the bivariate limit law 



C„-E[C„] S„-E[S B ] 



(Ai,A 2 ), 



n n 

which holds in distribution and with second mixed moments. Hence, the continuous mapping 
theorem implies, as n — 00, 

aC n + /3S n -(a E[C„] + j8 E[S„]) fl A 

► aA! + /3A 2 , 

in distribution and with second moments. Thus, we obtain, as n — 00, 

Var(aC„ + /3S„) = a 2 Var(C„) + A3 2 Var(S n ) + 2a/3Cov(C„,S„) 

~ (a 2 CT 2 , + /3 2 CT| + 2a/3CT CjS )« 2 . 

Note that by this approach also the covariances between all the single contributions from Table [3] 
that contribute with linear order in the first partitioning step to the number of executed Java 
Bytecodes used by Yaroslavskiy's algorithm can be identified asymptotically in first order. 
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Cost Measure 



error 



Yaroslavskiy's Quicksort 
with M = 7 



Classic Quicksort 
with M = 6 



Comparisons 
Swaps (for M - 1) 



expectation O(logn) 
std. dev. o(n) 

expectation O(logn) 



std. dev. o(n) 

Writes Accesses expectation O(logn) 

„ , i n j i expectation O(logn) 
Executed Bytecodes , , 

std. dev. o(n) 

Correlation Coefficient for 



Comparisons and Swaps 



o(l) 



1.9ralnra- 2.49976rc 
0.5089371 

0.6ti In 7i- 0.10700477 
0.32836577 

l.lnhxn- 0.40803977 

21.771 In 77- 3.5631977 
6.4864677 

-0.0512112 



2n\nn- 2.304577 a 
0.64827877 b 

0.3nln77-0.58537377 a 
0.023725177 b 

0.677ln77 + 0.31695377 a 

1877 In 77 + 6.2148877° 
3.5272377 d 



-0.86404 e 



Table 4: Summary of the results of this paper and comparison with corresponding results for classic 
Quicksort. M is chosen such that the number of executed Bytecodes in minimized. For the 
number of swaps, we only give the results for M = l as Insertionsort does not use swaps. Some 
asymptotic approximations in this table are less precise than the ones we actually compute. 
They have only been weakened for conciseness of presentation; see the corresponding sections 
for ful l precision results. T he asymptotics use the well-known expansion of Harmonic Numbers 



[e. g.|Graham et al. 



1994 



eq. (6.66)] 

a see |Sedgewick||l977] p. 334]. 

b see |Henneqtiin|[l989l p. 330]. 

c see | Wild 2012 p. 123] for Bytecode counts of classic Quicksort. 

d as in INeininger 2001, p. 515] for MIX, but with Bytecode costs of (Wiid||2012) . 

e see |Neininger||200T| Table 1]. 



5. Conclusion 

For the reader's convenience, we summarize the main results of this paper in Table [4] For 
comparison, we also collect corresponding results from previous works on classic single pivot 
Quicksort there. 

In this paper, we conducted a fully detailed analysis of Yaroslavskiy's dual pivot Quicksort — 
including the optimization of using Insertionsort on small subarrays — in the style of Knuth's book 
series The Art of Computer Programming. We give the exact expected number of executed Java 
Bytecode instructions for Yaroslavskiy's algorithm. Bytecode instructions serve merely as a sample 
of one possible detailed cost measure; implementations in different low level languages can easily 



be analyzed similarly using the block execution frequencies computed in Section 3.2 



On top of the exact average case results, we establish existence and fix point characterizations 
of limiting distribution of normalized costs. From this, we can compute moments of the limiting 
distributions, in particular the asymptotic variance of the number of executed Bytecodes. 

The mere fact that such a detailed average and distributional analysis is tractable, seems worth 
noting. 

As observed by [Wild and N ebel [2012], Yaroslavskiy's algorithm uses 5 % less key comparisons, 
but 80 % more swaps in the asymptotic average than classic Quicksort. Unless comparisons are 
very expensive, one should expect classic Quicksort to be more efficient in total. This intuition 
is confirmed by our detailed analysis: In the asymptotic average, the Java implementation of 
Yaroslavskiy's algorithm executes 20 % more Java Bytecode instructions than a corresponding 
implementation of classic Quicksort. 

Strengthening confidence in expectations, we find that asymptotic standard deviations of all 
costs remain linear in n. By Chebyshev's inequality, this implies concentration about the mean. 
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Whereas the number of comparisons in Yaroslavskiy's algorithm shows slightly less variance 
than for classic Quicksort, swaps exhibit converse behavior. In fact, the number of swaps in classic 
Quicksort is highly concentrated because it already achieves close to optimal average behavior: In 
the partitioning step of classic Quicksort, every swap puts both elements into the correct partition 
and we never revoke a placement during one partitioning step. In contrast in Yaroslavskiy's 
algorithm, every swap puts only one element into its final location (for the current partitioning 
step); the other element might have to be moved a second time later. 

Another facet of this difference is revealed by considering the correlation coefficient between 
swaps and comparisons. In classic Quicksort, swaps and comparisons are almost perfectly neg- 
atively correlated. A "good" run w r. t. comparisons needs balanced partitioning, but the more 
balanced partitioning becomes, the higher is the potential for misplaced elements that need to be 
moved. 

In Yaroslavskiy's partitioning method, such a clear dependency does not exist for several reasons. 
First of all, even if pivots have extreme ranks, sometimes many swaps are done; e. g. if p and q are 
the two largest elements, all elements are swapped in our implementation. Secondly, for some pivot 
ranks, comparisons and swaps behave covariantly: For example if p and q are the two smallest 
elements, no swap is done and every element's partition is found with one comparison only. In the 
end, the number of comparisons and swaps is almost uncorrelated in Yaroslavskiy's algorithm. 

The asymptotic standard deviation of the total number of executed Bytecode instructions is 
about twice as large in Yaroslavskiy's algorithm as in classic Quicksort. This might be a consequence 
of the higher variability in the number of swaps just described. 

Concerning practical performance, asymptotic behavior is not the full story. Often, inputs in 
practice are of moderate size and only the massive number of calls to a procedure makes it a 
bottleneck of overall execution. Then, lower order terms are not negligible. For Quicksort, this 
means in particular that constant overhead per partitioning step has to be taken into account. For 
tiny n, this overhead turns out to be so large, that it pays to switch to a simpler sorting method 
instead of Quicksort. We showed that using InsertionSort for subarrays of size at most M 
speeds up Yaroslavskiy's algorithm significantly for moderate n. The optimal choice for M w. r. t. 
the number of executed Bytecodes is M = 7. 

Combining the results for INSERTIONSORT from Appendix [B] and a corresponding Bytecode 
count analysis of a Java implementation of classic Quicksort [Wild, 2012], we can compare classic 
Quicksort and Yaroslavskiy's algorithm exactly. As striking result we observe that in expectation, 
Yaroslavskiy's algorithm needs more Java Bytecodes than classic Quicksort for all n. Thus, the 
efficiency of classic Quicksort in terms of executed Bytecodes is not just an effect of asymptotic 
approximations, it holds for realistic input sizes, as well. 

These findings clearly contradict corresponding running time experiments [Wild] [20 12 [ Chap- 
ters], where Yaroslavskiy's algorithm was significantly faster across implementations and pro- 
gramming environments. 

One might object that the poor performance of Yaroslavskiy's algorithm is a peculiarity of 
counting Bytecode instructions. |Wild| | |2012| Section 7.1] also gives implementations and analyses 



thereof in MMIX, the new version of Knuth's imaginary processor architecture. Every MMIX 
instruction has well-defined costs, chosen to closely resemble actual execution time on a simple 
processor. The results show the same trend: Classic Quicksort is more efficient. Together with the 
Bytecode results of this paper, we see strong evidence for the following conjecture: 

Conjecture 1. The efficiency of Yaroslavskiy's algorithm in practice is caused by advanced features 
of modern processors. In models that assign constant cost contributions to single instructions — i. e. 
locality of memory accesses and instruction pipelining are ignored — classic Quicksort is more 
efficient. 

It will be the subject of future investigations to identify the true reason of the success of 
Yaroslavskiy's dual pivot Quicksort. 
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A. Solving the Dual Pivot Quicksort Recurrence 

APPENDIX 

A. Solving the Dual Pivot Quicksort Recurrence 

The proof presented in the following is basically a generalization of the derivation given by 
Sedgewick [1975, p. 156ffJ. [Hennequin] | |1991[ gives an alternative approach based on generating 
functions that is much more general. Even though the authors consider Hennequin's method 
elegant, we prefer the elementary proof, as it allows a self-contained presentation. 

Two basic identities involving binomials and Harmonic Numbers are used several times below, 
so we collect them here. They are found as equations (6.70) and (5.10) in [Grah am et al. 1994) . 



L O^k = UJ^-^i), integer m>0, (A.l) 

0<&<n 

I (*) = CA), integers n,m>0. (A.2) 

0<£<ra 

Proof of Theorem^ The first step is to use symmetries of the sum in ( |3.2| l. 

£ (Cp-i + Cf-p-i + C-,) = X>-/>)Cp-i + X>-1-*)C* + ^(q-DCn-q 

l<p<<7<« p=l k=0 q=2 

n-2 

= 3j^(n-k-l)C k . 

k=o 

So, our recurrence to solve is 

n-2 

C n = Tn + ^^Y^in-k-DCk, iovn>M. (A.3) 

We first consider D n := [^ )C n+ i - {^\C n to get rid of the factor in front of the sum: 

d(n):= 

, * N 

D n = { n f)T n+1 -QT n (n>M + 2) 

+ {n+ 2 ) \n"l)n i:(n-k)C k - B( V 1) B o.-l) I("-*-l)C* 
k=0 k=0 

n-1 

= din) + 3 X C k . 

The remaining full history recurrence is eliminated by taking ordinary differences 

E n :=D n+1 -D n = d(n + l)-d(n) + 3C n . (n>M + 2) 
Towards a telescoping recurrence, we consider F n := C n - B j^ ■ C n -i , and compute 

Fj7 f~i n—2/~i i r^ n—3/~i \ 

n+2~^n+l - W+2 - f^2^n+l~ (^n+l~ T^J^raJ 

= C„ +2 -^C„ +1 +^fC„. (A.4) 

The expression on the right hand side in itself is not helpful. However, by expanding the definition 
of E n , we find 

(E n -3C n )/{ n+ 2 2 ) = (D n+1 -D n -3C n )/[ n f) 

= [[ n+ 2 2 )Cn+2 ~ [ n l X )C n +l ~ [{ n 2 1 ) C n+l ~ QCn) ~ 3C„]/ ("J 2 ) 

- r - _2"_r' -i- a"'"" 1 )" 3 n 

- O ra+2 n+2 ^n+l+ l (n+2)(n _i) U » 

r -JZLr l(n-3)(n+2d 

- ^n+2 n+2^n+l+ i (n+2Xn _ 1} ^n 

= C„ +2 - T^C^+i + ^f C„ . (A.5) 



27 



A. Solving the Dual Pivot Quicksort Recurrence 

Equating < \AA\ and ( |A.5[ > yields 

F n+2 -F n+1 = (E n -3C n )/{ n+ 2 2 ) = {d(n + l)-d(n))/{ n+ 2 2 ) . (n>M + 2) 

s , ' 

This last equation is now amenable to simple iteration: 

F n = Z fd-2) + F M+3 . (n>M + 4) 

i=M+4 

v , ' 

g(n) 

Plugging in the definition of F n -C n - ^ • C n -i yields 

C n = ^-Cn-i + gin). («>M + 4) (A.6) 

Multiplying ( |A.6| l by (']) and using (™) • ^ = C*" 1 ) gives a telescoping recurrence: 
G n := { n 4 )C n = G n - 1 + {1)gW 

= E Qgd) + G M+ s 

i=M+4 =0 

, * ^ 

n M+3 i 

= E Qeo) + G M+3 - E (D E Aj-2) + f m+3 

= E (D*(*) + Gm + 3 - ( M t) F M + 3 , (A.7) 

where the last equation uses ( 1A.2| ). Applying definitions, we find 



i-2 



L \JgM = L UJKM+3 + L 77^ 

i = l j=l j=M+2 [ 2 ) 



(i) 



= (";K3 + E Q) E [tj+2 - %T j+1 + -M-rJ . (A8) 

i=M+4 j=M+2 [ 2 J 

Using ( |A.8| l in ( |A.7| l, we finally arrive at the explicit formula for C ra valid for n > M + 3: 

r +1 ) i n ■ i ~ 2 i i j ) 

C n = -^-F M+ 3 + pJT T. ii) T. [ T J+2 ~ jk T J+l + J]T¥: T j) 
UJ UJj=M+4 J=M+2 V [ J D ' 

n (M+4\ 7-. 

&M+3 ~ (. 5 J-PM+3 

+ (!) ' 

Expanding F and G according to their definition gives 



\ n . i-2 U\ 

TTvT E (4) E ( T J+2 " JT2 T i+1 + TU2~: T Jj 
UJ i=M+4 j=M+2 V (■'2 J 



+ 



2 

iM+3~\ _ iM+i\ iM+4\ 

(n+1 , U J I 5 ) \ n ftf-l (n+1 [ 5 )_ } n 

[~5~ + pn j^M+3 ~ Af+3"l~5 pn _ J'-M4 



G) ' M+3[ 5 G) 

which concludes the proof. □ 
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A. Solving the Dual Pivot Quicksort Recurrence 

Proof of Proposition^ Of course, we start with the closed form ( |3.3| l from Theorem [TJ which 
consists of the double sum and two terms involving "base cases" Cm+2 and Cm+3- 



C n = pn L (4). L ( r j+2 " JT2 T J+1 + 7j+2\ T j) 



rn\ L-, \4) L-, 'J+' ;+2"^ +i ' /7+2v 
(4) i=M+4 j=M+2 ( 2 J 

M+3l rM+4i , M !- 1 , 

+2 



(n+1 , v 4 J I 5 J ^ M-l fn+1 LAJ'lr' 



P 

We first focus on the sums. Assuming the even more general form 

partial fraction decomposition of the innermost term yields 

ni := (T J+2 - jL Tj+1 + £)/(*■) Tj)= B^_2_^ 

Note that contributions from -^ , ^ and n( °i 1) cancel out. This allows to write the inner sum in 
terms of Harmonic Numbers: 

i-2 

£ r\j = (8a-2fe)(^fj-^k + 3)-(2a-26)(^-i-^k +2 ) 

J=M+2 

= 6a(^-^k+3) + (2o-26)(f- ] ^3). (A.9) 

(The second equation uses the basic fact J?k-i - -^k - \-) 

Using HA.1) , ( |A.2| ) and the absorption property of binomials (£) = (ilj)f , one obtains 

1 n i~% Rn 1 i 

^ E ( 4 ) I i> = S(fH + i4)-(T)» + 4-l 

UJ i=M+4 j=M+2 I4J 

2aj-26 n ~ tl(i-fr Wi^ 

in^. L, Ul 3 I M+3 Wj 

(4) i=M+3 



+ 

u 

Qa 



3P l( n+l \ (M+4\\ 
J^^M + 3{{ 5 J-l 5 JJ 

iM+4} 

a(n + l)(je n+1 -\) - Qa^-^-(je M+ 4-\) 



6 



+ 2 ^p(KG)-(T))-^3(r5 1 )-( M 5 +4 ))) 



■6a^k +3 (^-r 5 +4 )/a)) 



lain + l)Je n+1 - 2J* (6a^f M+ 3 + ^f + fa) + ^ 

fM+4 



cM+4\ 

— — (fa - 6a(^f M +4 - ^k+3) - ^- mu + ^mw) 



lain + l)je n+1 - 2ji (6a^f M+ 3 + ^f + §0) 

fM+4 



fiW+4-l 
a-b y 5 J f6 2a-26 56-17cn (Am) 

^ 2 * fm U u ^ Af+3 ^2(M+4)J- Ui.-LU; 



It remains to consider the second and third summands of < |3.3[ > 

rAf+3 , i _ r M+4\ iM+4\ 
„ ._ fn+1 , v 4 J I 5 ) \ r , M-l fra+1 U)]n 
P - ("5" + m J^M+3 " M^3{~5 77n-J^M+2- 
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A. Solving the Dual Pivot Quicksort Recurrence 



We start by applying definition ( |3.2| ) twice and using C„ = Cjf for n < M to expand Cm+z 

3 M+l 
I 2 J *=° 



C M+3 = T M+3 + -^-3- £ (M + 2-k)C k 



3 3 *f-i •. 3 M 

r M + 3 + ^^1 + ^ I (M- fe)Cf + -^- £(M + 2-*)C 

(2 ) I 2 J *=<> I 2 J *=° 



Tm + 3 + -^T M+1 + » L ( W + 2 -« + ?^j*))cf (A.11) 

I 2 J (2 J*=o v I 2 J 



and C M+ 2 



3 



M 



C M+2 = Tm +2 + jj^r Y,(M+l-k)Cf. (A. 12) 

( 2 Jfc=0 

Equations QA. 11[ > and ( |A.12| ) are now inserted into the second and third summands of ( |3.3| l. With 
T n - an + b for n > M + 1, this yields 

I 2 J 
, B+ if f 3(M + 2-fe) 9(M-j) M^ UM+l-kh js 

5 Z^ I fAf+3-i fAf+lifM+3-\ M+3 f M+2-\ J * 

k=0" { 2 J I 2 K 2 J I 2 J 

+ -pp((jra " DCm+3 + ^gCM+2j 

- » + lfc„ 1 6(6-a) , 2(4a-6) 1 , n+1 y % M ~ 2k r lS (Al^ 

~ "5"P a+ "M?2" + M+3 I + "5" 2- (M+2) ^ k (A.lcSJ 



, v 5 > 1 M-l n M-l ri \ 

+ pj lM+3 UAf+2_ M+4 L/ * f+3 i 

Adding ( |A.10| > and ( |A.13| l finally yields the claimed representation 



M 'iM — Ob 
r - 6„/„, ,iM , ra+lr!9„ 1 6(6-0) «„ ^e 1 , 0-6 , n+1 V /^ IS 



(•Af+4i 
1 v 5 > (6„ , 2(0-6) , 56-17a M-l n , M-l n 1 

+ rn\ U a+ Af+3 + 2(M+4) M+4^M+3 + M+3 ^M+2) ■ 



=0(3i 



For the asymptotic representation ( |3.5| l of C n , the penultimate summand is because of the 
assumption Cjf = 0. The last summand is in ®(n~ 4 ) and therefore vanishes in the O(-) term (we 
assume M = 0(1) as n -* 00 to be constant). Now, replacing J€ n by its well-known asymptotic 
estimate 



J€ n = ln(«) + y+i« + 0(^) [Graham etaL|[l99^ i eg. (6.66)] 



and expanding terms in ( |3.4| l directly yields ( |3.5| ). 

Finally, the case Ti - ^ 2a + b affects the derivation only at a single point: As M > 1, the only 
occurring toll function that can ever equal Ti is Tm+i, which occurs only in Cm+3, see ( |A.11| ). In p, 
we multiply C M +3 by (^ + (( M 4 +3 ) - [ M ^))/{1)) = §(n + D + 0(«" 4 ). Consequently, we have to subtract 

5Mi(5(» + l)-yjj^3r(2a + 6) + 0(«- 4 )) = <5 m ^±*(n + l) + 0(«" 4 ) . 

(. 2 J 

(The second equation follows by setting M = 1.) 

This concludes the proof of Proposition [l] □ 
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B. Insertionsort 



Algorithm 2 Insertionsort as given and analyzed by Knuth [19981 



lNSERT10NSORT(k,left, right) 

1 for i = left + 1 , . . . , right 

2 j := i-1; v := A[i] 

3 while y' > left a v < k[j] 

4 AU + l]:=Atf]; j 

5 end while 
e AL/ + l]:=u 
7 end for 



7-1 



2a 



i := fe/i+l; 















2b G 








i < right 




















no 


yes 






no 














yes 


2e E 






2c G-I 








2d E + D 


Alj+1] := A{Ji; 

j<left 


j:= i-l; 
u ■- A[i]; 




v<A[j] 








no 














yes 










2f G-I-D 












Goto; 






















2g G-I 






2h / 






A{J + l\ = v; 




Return 



















Figure 3: Control flow graph of our Java implementation of InsertionSort (Algorithm^. The block 
names "2a"-"2h" indicate that these blocks replace block|2]in Figure[T] this figure provides a 
"close-up view" of block[g] Blocks|2f]and|2h]contain control flow statements needed in Java 
Bytecode, which would normally be represented by arrows only. They are shown to remind us 
of their cost contributions. 



B. Insertionsort 



In this section, we consider in some detail the InsertionSort procedure used for sorting small 
subarrays. Insertionsort is a primitive sorting algorithm with quadratic running time in both worst 
and average case. On very small arrays however, it is extremely efficient, which makes it a good 
choice for our purpose. 

Our implementation of INSERTIONSORT is given as Algorithm [2] and its control flow graph 
is shown in Figure [3] Algorithm [2] is based on the implementation by | Knuth 1 1998 Program 
S]. Knuth assumes n > 2 in his code and analysis, but our Quicksort implementation also calls 
INSERTIONSORT on subarrays of lengths or 1. Therefore, Algorithm [2] starts with an index 
comparison "i < right" to handle these cases. 

Figure [3] lists the execution frequencies of all basic blocks. The names are chosen to match 
the corresponding notation of Sedgewick [1977| and denote the total execution frequencies across 
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C. Low-Level Implementations and Instruction Counts 



Block 
# Bytecodes 



? ? e 1 1 5 ? i % % 



Block [L^ [L3] 
# Bytecodes <F 12 



15 16 



18 19 20 



Block 
# Bytecodes 



2e 2fl 2e 



Table 5: Bytecode counts for the basic blocks of Figures[T]and[3] 



all invocations of InsertionSort on small subarrays caused by one initial call to QuiCKSORT- 
Yaroslavskiy. Let us define I, G, D and E to denote the frequencies when we use InsertionSort 
in isolation for sorting a random permutation. These frequencies are analyzed by Knuth 1 1998 , 
p. 82] for n > 2. As mentioned above, our implementation has to work for n > 0, so our analysis must 
take the special cases n e {0, 1} into account. We find 



I(n) = 1, 



G(n) = n + 8 n (i, 



DM = n-je n 



and 



EM = G)/2. 



We can compute I, G, D and E by inserting I, G, D and E for C IS in the solution provided by 
Proposition [l] on page [7] 



M 



I(n) = i(n + l)---5-^(3M-2A)/(A) = 
I 3 )*=0 



12 



5(M+2) 



(n + D, 



(B.l) 



D(re) = (1 + 



12 



5(M+2) 5(M+2) 



J5fM+l)(ra + D, 



EM = {hM +w ^-^)(n + l). 

Using these frequencies, we can easily express the expected number of key comparisons, write 
accesses and executed Bytecode instructions: The only place where key comparisons occur, is in 
block 2d so C (IS) -D+E. Write accesses to array A happen in blocks |2e] and 2g giving W (/S) = 
E + {G- 1). The number of Bytecodes is given in the next section. 



C. Low-Level Implementations and Instruction Counts 

Figure [4] shows the Java implementation of Yaroslavskiy's algorithm whose Bytecode counts 
are studied in this paper. The partitioning loop is taken from the original sources of the Java 
7 Runtime Environment library (see for example http: //www, doc jar .com/html /api/ Java /| 
|util/DualP ivot Quicksor t . Java . htm l). 

The Java code has been compiled using Oracle's Java Compiler (javac version 1.7.0_17). The 
resulting Java Bytecode was decomposed into the basic blocks of Figures [l] and [3] Then, for 
each block the number of Bytecode instructions was counted, the result is given in Table [5] We 
have automated this process as part of our tool MaLiJAn (Maximum Likelihood Java Analyzer), 
which provides a means of automating empirical studies of algorithms based on their control flow 
graphs | |Laube and Nebel| |2010 ; Wild eTaT||20T3) . 

By multiplying the Bytecodes per block with the block's frequency, we get the overall number of 
executed Bytecodes. For Yaroslavskiy's Quicksort, we get 

BC (QS) _ 5R + 7A + 8B + 9(A _ B)+ 10A + 3(A + c (1) ) + 7C (1) + 12S (1) + 3(C (1) - S a) ) 



+ 5C (3) + 3(C (3) - C (4) + F) + 2(C (3) - C (4) ) + 5C (4) + 6(C (4) 
+ 5C (4) + 2C (1) + 42A + lfl 
71A -1B + 6R + 15C (1) + 10C (3) + 11C <4) + 9S a) + 8S (3) + 3F . 



S {3) ) + 14S 



(3) 



(C.l) 
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void quicksortYaroslavskiy (int[] A, int left, int right) { 
if (right - left < M) { 

insertionsort (A, left, right); 
} else { 

final int p, q; 

if (A[left] > Alright] ) { 

p = A[right]; q = A[left]; 
} else { 

p = A[left]; q = Alright]; 
} 

int 1 = left + 1, g = right - 1, k = 1; 
while ( k <= g ) { 

final int ak = A[k] ; 
if (ak < p) { 

A[k] = A[l]; A[l] = ak; ++1; 
} else if (ak >= q) { 

while (A[g] > q && k < g) 

—g; 

if (A[g] < p) { 

A[k] = A[l]; A[l] = A[g]; ++1; 
} else { 

A[k] = A[g]; 

] 

A[g] = ak; —g; 

} 

-++k; 
} 

— 1; + + g; 

A[left] = A[l]; A[l] = p; A[right] = A[g] ; A[g] = q; 
quicksortYaroslavskiy (A, left, 1 - 1 ) ; 
quicksort Yaroslavskiy (A, g + 1, right); 
quicksortYaroslavskiy (A, 1 + 1, g - 1) ; 
} 
} 

void insertionsort (int [ ] A, int left, int right) { 
for (int i = left +1; i <= right; ++i) { 
int j = i-1; final int v = A[i] ; 
while (v < A[ j] ) { 

A[j+1] = A[j]; —j; 
if (j < left) break; 
} 
A[j+1] = v; 



Figure 4: Java implementation of Yaroslavskiy's algorithm. 
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D. Proofs for Mixed Distribution Asymptotics 



Additionally, we have for InsertionSort 

BC iIS) = 8I + 3G + 8(G-I) + 5(E + D) + 12E+1(G-I-D) + 8(G-I) + 2I 
= 4D + 17E + 20G-7I . 



(C.2) 



Note that Wild |2012| investigated a different (more naive) Java implementation of 
Yaroslavskiy's algorithm and hence reports different Bytecode counts. 



D. Proofs for Mixed Distribution Asymptotics 



This section gives the proofs for the two technical lemmas from Section 4.2 on the asymptotic 
behavior of mixed distributions. We denote by Bin(n,p) the binomial distribution with n trials and 
success probability p. 

Proof of Lemma^ The hypergeometric distribution HypG(&, r,r + b) has mean and variance 
given in ( |2.2| l. In particular for sequences (a n ) n >i, (/3 ra ) ra >i with a n - an + r n and f> n - f>n + s n with 
a,/3e(0, 1) and |r„|,|s„| <n 2/3 , we obtain for hypergeometrically HypG(a„,/3„,«) distributed random 
variables Y n that Var(Y„) < n and moreover 



n a 

ap 

n 



Yn EY n 

n 



n 



EY„ 



-a/3 



VVsi(Y n ) 



+ 



OLnPn 



a/3 



_ 1/2 a\s n \ (}\r n \ \r n s n \ 
< n H 1 1 ^ — 



< n- 1/2 + 2n- 1/3 + n- m < An~ m . 



(D.l) 



Now, we first condition on V:=(Vi,...,V&) = (i;i,...,i;6)=:i;, where v satisfies vt e[0, 1] and Y.^ =1 v r - 1. 
Conditionally on V - v, the random variables T.j £ j t Lj have a binomial Bin(n,wt) distribution, where 
u>i :- T.jej t Vj for i = 1,2. We denote 



K ■■= n{Y 1 L j z{nw i -n 2/3 ,nw i +n m ]}, 



j=l j£j, 



Then by Chernoff's bound [ McDiar mid| 1 1998] p. 195], we obtain uniformly in v that 



P(B v n ) > l-4exp(-2« 1M ) 



as n — ► oo . 



(D.2) 



We abbreviate W; := Zjej,^ for i = 1,2 and let ZJ^ denote Z„ conditional onV = D. By Py we denote 
the distribution of V. Then, we obtain with ( |D.l| l 



Z„ 



-WiW 2 



/ 
II \ ZA 

jjbi n 



n 



-W\W2 



-W\W<1 



dP v (v) 



dP> + 4exp(-2« 1M )- max\^ -wiw 2 \ dP v (v) 

0<z<n n 



<4«" 1/3 +4exp(-2« 1/3 ) 
—►0, as n — ► 00, 

which concludes the proof. □ 

Proof of Lemma |5f Let i e {l,...,re} be arbitrary. The strong law of large numbers implies 
that Li/n^Vi almost surely as n -* 00. The function x >-* xln(x) is continuous on [0, 1] (with the 
convention xln(x) = for x - 0). Hence, as n —■ 00, 

\T T I 2 

^ln(^)-V;lnVi — almost surely . 
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E. Experimental Validation of Asymptotics 
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Figure 5: Comparison of sample means with the asymptotics computed in Section[3] The asymptotics 
are shown as solid line |— *— | the sample means are indicated by circles [3 Additionally, the 
error bars show the sample standard deviation. On the x-axis, the input size n is shown, the 
y-axis shows the corresponding counts normalized by re Inn. 
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Figure 6: Comparison of sample variances with the asymptotics computed in Section|4] The asymptotics 
are shown as solid line — x— , the sample variances are indicated by circles o. On the x-axis, 
the input size n is shown, the y-axis shows the corresponding variances normalized by n 2 . 



Since the non-positive function x >-» xln(x), x e [0, 1] is lower bounded (e. g. by -lie) the square in 
the latter display is uniformly bounded (e. g. by (2/e) 2 ). Hence the dominated convergence theorem 
implies 



E [|^ ln ^)- y * lnV 'f] - °. 



as n — <■ oo . 



This concludes the proof. 



□ 



E. Experimental Validation of Asymptotics 

In this paper, we computed asymptotics for mean and variance of the costs of Yaroslavskiy's 
algorithm. Whereas the results for the mean are very precise and indeed can be made exact with 
some additional diligence, our contraction arguments only provide leading term asymptotics. In 
this section, we compare the asymptotic approximations with experimental sample means and 
variances. 

We use the Java implementation given in Appendix [C] and run it on 10000 pseudo-randomly 
generated permutations of {l,...,ra} for each of the 20 sizes in {10 5 ,2 10 5 ,...,2 10 6 }. Note that for 
sensible estimates of variances, much larger samples are needed than for means. The experiment 
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Figure 7: Histograms for the distributions of the normalized number of comparisons C* , swaps S* and 
executed Bytecode instructions SC* from the sample of 10000 random permutations of size 

re = 10 6 . 




_?z_=3 

n — 5 



re = 10 



re = 15 



w = 20 



mean 



Figure 8: This figure shows the distributions of the normalized number of comparisons for small n. The 
distributions are computed by unfolding the distributional recurrence < |4.12| >. This figure serves 
as visual evidence for the convergence of cost distrib utions to a limit law It is h eavily inspired 
by a corresponding figure for classic Quicksort [Sedgewick and Flajolet 1996 Figure 1 .3]. 



itself is done using our tool MaLiJAn, which automatically counts the number of comparisons, 
swaps and Bytecode instructions [Wild et~aL||2013) . 

Figure [5] shows the results for the expected costs and Figure [6] compares asymptotic and sampled 
variances. The histograms in Figure [7] give some impression how the limit laws will look like. 

It is clearly visible in Figure [5] that for the given range of input sizes, the average costs 
computed in Section [3] are extremely precise. In fact, hardly any deviation between prediction and 
measurement is visible. The variances in Figure [6] show more erratic behavior. As variances are 
much harder to estimate than means, this does not come as a surprise. From the data we cannot 
tell whether the true variances show some oscillatory behavior (in lower order terms) or whether 
we observe sampling noise. Nevertheless, Figure [6] shows that for the given range of sizes, the 
asymptotic is a sensible approximation of the exact variance. 

Figure [8] shows how fast the exact probability distribution of the normalized number of compar- 
isons approaches a smooth limiting shape even for tiny n. This strengthens the above quantitative 
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arguments that the limiting distributions computed in this paper are useful approximations of the 
true behavior of costs in Yaroslavskiy's algorithm. 
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